* Add hf_dataset configuration for using HF hub-hosted datasets for (Q)LoRA training
* Pre-commit formatting
* Fix YAML config example
* Print DS info
* Include name
* Add hf_dataset parameter default
* Remove TextHFDataset and CompletionsHFDataset and use Dataset and CompletionsDataset instead, adding a text_key constructor argument to the former (and changing it to work with a provided data structure instead of just from a JSON file), and prompt_key and completion_key arguments to the latter with defaults for backwards compatibility.
* nits
* update docs
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* LoRA: Extract pre_processing_model function
* LoRA: Extract small functions(train_model,evaluate_model)
* move test case to test_tuner_utils.py
* nits
* nits
* remove extra param, validate at it 0
* version
* fix test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* support dora finetune
* solve problems in lora.py and tuner.utils.py
* add use_dora (bool) in functions of load adapters
* delete all unsupported quantization code and fix all the calculate problems in mlx_lm/tuner/dora.py
* Using stop_gradient to prevent gradients from flowing through ‘norm’ during backpropagation
* set DEFAULT_USE_DORA in mlx_lm/generate.py
* add annotation for all the use_dora
* mlx_lm/fuse.py support fuse dora layers and fix a bug of to_linear() in mlx_lm/tuner/dora.py
* simplify code of juding type of a fused layer in mlx_lm/fuse.py
* add use_dora in mlx_lm/fuse.py when apply_lora_layers()
* style + nits
* style + nits
* more updates
---------
Co-authored-by: chenyifei08 <chenyifei08@baidu.com>
Co-authored-by: Awni Hannun <awni@apple.com>
* Added support for the MiniCPM architecture
* Added support for the MiniCPM architecture
* Updated utils.py and LORA.md
* Updated utils.py and LORA.md
* Update implementation details for MiniCPM architecture
* Cleaning up
* fixed the missing lm.head layer problem
* Refactor Model class to dynamically handle tied and untied word embeddings
* Quick update
* added a dynamic rope scaling base calucaltion
* Added support for the MiniCPM architecture
* Added support for the MiniCPM architecture
* Updated utils.py and LORA.md
* Updated utils.py and LORA.md
* Update implementation details for MiniCPM architecture
* Cleaning up
* fixed the missing lm.head layer problem
* Refactor Model class to dynamically handle tied and untied word embeddings
* added a dynamic rope scaling base calucaltion
* quick fix and clean up
* clean up again
* removed the MiniCPMNorm class as its not used
* forgot something, sorry
* format
* version bump
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* wip
* wip
* feat: convert mlx model to gguf f16
* chore: conver norm layer to float32 to avoid overflow issue
* chore: add support for mixtral
* chore: clean up
* chore: remove unused import statement
* chore: clean up weight name mapping
* version and readme
* actual version bump
---------
Co-authored-by: Awni Hannun <awni@apple.com>