This commit introduces a comprehensive memory estimation utility for MLX language models, supporting:
- Dynamic parameter calculation across diverse model architectures
- Handling of quantized and standard models
- Estimation of model weights, KV cache, and overhead memory
- Support for bounded and unbounded KV cache modes
- Flexible configuration via command-line arguments
The new tool provides detailed memory usage insights for different model configurations and generation scenarios.
* initial commmit
* adding more customized YAML configuartion
* update YAML example file
* Changed the switch to set opt_class
* removing muon
* using default arguments
* udpate
* initial commit
* udpate ACKNOWLEDGMENTS.md
* adding olmoe to training
* clean up
* faster generation
* remove sanitize method
* more clean ups
* adding SwitchGLU
* clean up
* a little faster and adding norm_topk_prob
* formated
* Fix plamo2 model to use rms_norm and enable sliding window attention
* Fix missing variable
* Remove sliding window attention impl. cause it should be done by using RotatingKVCache
* Remove unused imports
* Add pfnet/plamo-2-1b
* Fix cache.py to support non-top level layers
* Use mlx's BaseModelArgs
* Fix model
* Use sanitize()
* Remove unnecessary changes
* Add plamo2.py
* Apply formatter
* Fix some part
* Allow a cache obj defined externally
* Fix channel first weights to channel last for right use of MLX's conv1d
* Remove unused code part
* Give all inputs when it's the first time call of model
* Fix import
* Include .jsonl files to download from Huggingface hub
* Fix reference to layers
* Remove unnecessary code and add a test for plamo2
* Do not pass mask to prepare_inputs_for_generation
* Fix to use repeat instead of tile
* Add state property to PlamoCache
* Add __iter__ and __next__ methods to PlamoCache
* cleanup
* cleanup
* fix
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
- Optional completion only fine-tuning with `--mask-prompt`
- Collections of Hugging Face datasets
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* chore(mlx-lm): support text type content
* chore: optimize the messagef content processing
* nits + format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats.
* Persist configured prompt/completion key
* rebase + nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>