* LoRA:Refactor TrainingCallback to enhance flexibility and extensibility
This commit refactors the TrainingCallback class to accept a dictionary parameter for both on_train_loss_report and on_val_loss_report methods. By switching from multiple parameters to a single dict parameter, this change significantly improves the class's flexibility and makes it easier to extend with new training or validation metrics in the future without altering the method signatures. This approach simplifies the addition of new information to be logged or processed and aligns with best practices for scalable and maintainable code design.
* LoRA: Add printing and callbacks for learning rate during training
Mixtral models throw the following exception
```
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/generate.py", line 119, in <module>
main(args)
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/generate.py", line 96, in main
model, tokenizer = load(args.model, tokenizer_config=tokenizer_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 278, in load
model = load_model(model_path)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 221, in load_model
model_class, model_args_class = _get_classes(config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/utils.py", line 46, in _get_classes
arch = importlib.import_module(f"mlx_lm.models.{model_type}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mlx_lm/models/mixtral.py", line 11, in <module>
@dataclass
^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1230, in dataclass
return wrap(cls)
^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1220, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 1027, in _process_class
_init_fn(all_init_fields,
File "/opt/homebrew/anaconda3/lib/python3.11/dataclasses.py", line 545, in _init_fn
raise TypeError(f'non-default argument {f.name!r} '
TypeError: non-default argument 'model_type' follows default argument
```
* LoRA: Improve validation error for LoRA layer count exceeding model layer
This commit enhances the error handling when the specified LoRA layer count exceeds the total number of layers in the model. It clarifies the error message to provide actionable feedback for users, guiding them to adjust their input parameters accordingly.
* format + nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* lazy model import in mlx_lm
* change lora loading
* fix olmo lora
* remove a bunch of unused stuff from plamo
* move phixtral to mlx-lm and out of llms/
* Add checkpoints directory for adapter weights
The code was modified to create a checkpoints directory if it doesn't exist yet. Adapter weights are now saved to this checkpoints directory during the training iterations.
Corrected indentation of Save adapter weights code because it was part of "if eval"
* Fixing a blank added by mistake
* update a few examples to use compile
* update mnist
* add compile to vae and rename some stuff for simplicity
* update reqs
* use state in eval
* GCN example with RNG + dropout
* add a bit of prefetching
* initial commit
* style fixes
* update of ACKNOWLEDGMENTS
* fixed comment
* minor refactoring; removed unused imports
* added cifar and cvae to top-level README.md
* removed mention of cuda/mps in argparse
* fixed training status output
* load_weights() with strict=True
* pretrained model update
* fixed imports and style
* requires mlx>=0.0.9
* updated with results using mlx 0.0.9
* removed mention of private repo
* simplify and combine to one file, more consistency with other exmaples
* few more nits
* nits
* spell
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* chore(mlx-lm): add model weight index in save_weights
* Update llms/mlx_lm/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update llms/mlx_lm/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* chore: save total siZe as param size isntead of file size
* chore: clean up format
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
A new argument "--max_seq_length" has been added to the command-line parser and passed as a parameter to the main function of the lora.py script. This allows users to specify and control the maximum sequence length during training.
* LoRA: Remove unnecessary model type judgments
1. Supported models are already checked in the load_model function in utils, no need to repeat the check in lora
2. The checks in lora are not synchronized with those in utils
* LoRA: add LoRA supported models in mlx_lm utils
* feat(mlx-lm): add de-quant for fuse
* chore: disable quant in to linear when de-quant enabled
* chore: add better error handling for adapter file not found
* chore(mlx-lm): pass max seq len to evaluate in training loop
* chore: make sure the batch seq not exceed max len
* chore: update comment
* chore: add warning before truncate input
* Draft of tiny llama from gguf
* Transpose all
* No transposition with new layout
* Read config from gguf
* Create tokenizer from gguf
* move gguf and update to be similar to hf_llm
* change model to HF style + updates to REAMDE
* nits in REAMDE
* nit readme
* only use mlx for metadata
* fix eos/bos tokenizer
* fix tokenization
* quantization runs
* 8-bit works
* tokenizer fix
* bump mlx version
---------
Co-authored-by: Juarez Bochi <juarez.bochi@grammarly.com>
Co-authored-by: Awni Hannun <awni@apple.com>
* fix the chinese character generation as same as PR #321
* reuse the generate logic to utils.py
* format
* verbose defualt
* fix conflicst with colorize and character check
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Convert HF weights of PLaMo and load it to a plamo model in mlx
* Fix model inference part
* Add bos at the beginning of the prompt
* Fix convert.py to copy tokenizer.model into the converted dir
* Use the required insturction format in generate.py when "--instruct" option is specified
* Change filenames and update existing scripts
* Add README
* Add requirements.txt
* Fix plamo.py to stop generation when EOS appears
* Add quantization to convert.py
* Use mlx>=0.0.9 for mx.core.outer() in PLaMo model
* Update acknowledgements.md
* Fix card text in upload_to_hub()
* Not use prompt template when --instruct is not specified
* Ask if you trust_remote_code for loading tokenizer of PLaMo
* Check the user trusts the remote code when converting
* Remove plamo directory
* Update README
* Add PLaMo model file
* Fix the handling of cache in PLaMo and update README
* Ask if trust_remote_code only when the model is PLaMo
* Remove resolve_trust_remote_code from convert.py and use the latest transformers
* Remove code not to add EOS
* Update README to fix an example not to use noncommercial version of the model
* Remove unused imports
* Remove unnecessary description about the instruct model of PLaMo from README
* format, nits in README
* typo
---------
Co-authored-by: Shunta Saito <shunta@mitmul-mbp.local>
Co-authored-by: Awni Hannun <awni@apple.com>
* Add colorized output option to generate script
Two new functions were added to the script that allow output to be colorized based on the T[0] probability. Changes were made to the `generate_step` function in utils.py to permit colorization. Additionally, an argument for colorization was introduced to the command-line parser.
* Rename 'colorize' parameter with 'return_probability' in generate_step
* add an option to apply the tokenizer chat template
* fix the option to apply the tokenizer chat template
* better error messages for chat template issues
* apply the chat template by default when possible
* nit in comment'
* rebase
---------
Co-authored-by: Awni Hannun <awni@apple.com>