* initial commmit
* adding more customized YAML configuartion
* update YAML example file
* Changed the switch to set opt_class
* removing muon
* using default arguments
* udpate
- Optional completion only fine-tuning with `--mask-prompt`
- Collections of Hugging Face datasets
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Adding full model weights finetuning
* Updating the LORA.md and ACKNOWLEDGMENTS.md files.
* removing --use-dora and --fulll-training and adding --fine-tune-type
* some clean up
* reformating and fixing dora training
* updated CONFIG_DEFAULTS
* update config example
* update in the config example fie
* Update LORA.md
* merge and commit
* adding argument for dora linear layer
* clean up
* clean up in the example yaml file
* fix
* final fix before sending
* small addition to re md file
* fix for loading the fully trained model by saving all the files and configs correctly
* clean up
* removing the unnesesairy files
* changing lora layers back to 16
* removed max file size
* nits
* resolve merge
* some consistency changes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* LoRA: Extract pre_processing_model function
* LoRA: Extract small functions(train_model,evaluate_model)
* move test case to test_tuner_utils.py
* nits
* nits
* remove extra param, validate at it 0
* version
* fix test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* support dora finetune
* solve problems in lora.py and tuner.utils.py
* add use_dora (bool) in functions of load adapters
* delete all unsupported quantization code and fix all the calculate problems in mlx_lm/tuner/dora.py
* Using stop_gradient to prevent gradients from flowing through ‘norm’ during backpropagation
* set DEFAULT_USE_DORA in mlx_lm/generate.py
* add annotation for all the use_dora
* mlx_lm/fuse.py support fuse dora layers and fix a bug of to_linear() in mlx_lm/tuner/dora.py
* simplify code of juding type of a fused layer in mlx_lm/fuse.py
* add use_dora in mlx_lm/fuse.py when apply_lora_layers()
* style + nits
* style + nits
* more updates
---------
Co-authored-by: chenyifei08 <chenyifei08@baidu.com>
Co-authored-by: Awni Hannun <awni@apple.com>
* Initial config handler and test
* Added means to run from CLI
* Update lora config loading and tests
* Constrain scheduler config (warmup and minimum LR) for each kind
* Update reference to moved schedule_config module
* Minor fix
* Fix typos
* Moved build_schedule and tests
* nits in schedule config
* flake
* fix path
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add dropout parameter to lora configuration
A dropout parameter has been added to the lora configuration settings in lora_config.yaml. The LoRALinear class in utils.py has been updated to take this new parameter. Additionally, a AttributeError: 'types.SimpleNamespace' object has no attribute 'prompt' related to `args.prompt` has been removed from lora.py.
* Update lora_config.yaml
Set dropout to 0.0 in the sample config file
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* chore(mlx-lm): fix print_trainable_parameters for quant models
* chore: clean up
* refactor: use layer type to check quant bits
* chore: address comment
* Add --lora-all-linear option to apply LoRa to all linear transfer block layers
* Moved to YAML config and added specification of rank & alpha
* nits in conifg, more tests
* nit
* run tests for prs
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Convert mlx_lm.lora to use YAML configuration
* pre-commit run fixes
* Fix loading of config file
* Remove invalid YAML from doc
* Update command-line options and YAML parameter overriding, per feedback in #503
* Minor wording change
* Positional argument
* Moved config to a (-c/--config) flag
* Removed CLI option defaults (since CLI options take precedence and their defaults are in CONFIG_DEFAULTS)
* pre-commit format updates
* Fix handling of CLI option defaults
* Prevent None values of unspecified CLI options from overwriting values from CONFIG_DEFAULTS
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* lazy model import in mlx_lm
* change lora loading
* fix olmo lora
* remove a bunch of unused stuff from plamo
* move phixtral to mlx-lm and out of llms/
A new argument "--max_seq_length" has been added to the command-line parser and passed as a parameter to the main function of the lora.py script. This allows users to specify and control the maximum sequence length during training.