* Added support for the MiniCPM architecture
* Added support for the MiniCPM architecture
* Updated utils.py and LORA.md
* Updated utils.py and LORA.md
* Update implementation details for MiniCPM architecture
* Cleaning up
* fixed the missing lm.head layer problem
* Refactor Model class to dynamically handle tied and untied word embeddings
* Quick update
* added a dynamic rope scaling base calucaltion
* Added support for the MiniCPM architecture
* Added support for the MiniCPM architecture
* Updated utils.py and LORA.md
* Updated utils.py and LORA.md
* Update implementation details for MiniCPM architecture
* Cleaning up
* fixed the missing lm.head layer problem
* Refactor Model class to dynamically handle tied and untied word embeddings
* added a dynamic rope scaling base calucaltion
* quick fix and clean up
* clean up again
* removed the MiniCPMNorm class as its not used
* forgot something, sorry
* format
* version bump
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Initial config handler and test
* Added means to run from CLI
* Update lora config loading and tests
* Constrain scheduler config (warmup and minimum LR) for each kind
* Update reference to moved schedule_config module
* Minor fix
* Fix typos
* Moved build_schedule and tests
* nits in schedule config
* flake
* fix path
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add dropout parameter to lora configuration
A dropout parameter has been added to the lora configuration settings in lora_config.yaml. The LoRALinear class in utils.py has been updated to take this new parameter. Additionally, a AttributeError: 'types.SimpleNamespace' object has no attribute 'prompt' related to `args.prompt` has been removed from lora.py.
* Update lora_config.yaml
Set dropout to 0.0 in the sample config file
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add --lora-all-linear option to apply LoRa to all linear transfer block layers
* Moved to YAML config and added specification of rank & alpha
* nits in conifg, more tests
* nit
* run tests for prs
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add Starcoder2 model and update utils.py
* Refactor model arguments and modules in starcoder2.py
* Refactor FeedForward class to MLP in starcoder2.py
* Fix typo
* pre-commit
* Refactor starcoder2.py: Update model arguments and modules
* Fix LM head and MLP layers
* Rename input layer norm
* Update bias in linear layers
* Refactor token embeddings in Starcoder2Model
* Rename to standard HF attention layer name
* Add LayerNorm
* Add transposed token embeddings (like in Gemma)
* Refactor MLP and TransformerBlock classes
* Add tie_word_embeddings option to ModelArgs and update Model implementation
* Add conditional check for tying word embeddings in Starcoder2Model
* Fix bias in lm_head linear layer
* Remove unused LayerNorm in stablelm
* Update transformers dependency to use GitHub repository
* fix lm head bug, revert transformer req
* Update RoPE initialization in Attention class
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* StableLM now part of Transformers as stablelm rather than stablelm_epoch; changed config to match new changes
* removing old file
* reference new stablelm
* LoRA:Refactor TrainingCallback to enhance flexibility and extensibility
This commit refactors the TrainingCallback class to accept a dictionary parameter for both on_train_loss_report and on_val_loss_report methods. By switching from multiple parameters to a single dict parameter, this change significantly improves the class's flexibility and makes it easier to extend with new training or validation metrics in the future without altering the method signatures. This approach simplifies the addition of new information to be logged or processed and aligns with best practices for scalable and maintainable code design.
* LoRA: Add printing and callbacks for learning rate during training
* LoRA: Improve validation error for LoRA layer count exceeding model layer
This commit enhances the error handling when the specified LoRA layer count exceeds the total number of layers in the model. It clarifies the error message to provide actionable feedback for users, guiding them to adjust their input parameters accordingly.
* format + nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* lazy model import in mlx_lm
* change lora loading
* fix olmo lora
* remove a bunch of unused stuff from plamo
* move phixtral to mlx-lm and out of llms/
* Add checkpoints directory for adapter weights
The code was modified to create a checkpoints directory if it doesn't exist yet. Adapter weights are now saved to this checkpoints directory during the training iterations.
Corrected indentation of Save adapter weights code because it was part of "if eval"
* Fixing a blank added by mistake
* feat(mlx-lm): add de-quant for fuse
* chore: disable quant in to linear when de-quant enabled
* chore: add better error handling for adapter file not found
* chore(mlx-lm): pass max seq len to evaluate in training loop
* chore: make sure the batch seq not exceed max len
* chore: update comment
* chore: add warning before truncate input