Commit Graph

188 Commits

Author SHA1 Message Date
Awni Hannun
d4666615bb
Lazy import + refactor Lora layer addition (#426)
* lazy model import in mlx_lm

* change lora loading

* fix olmo lora

* remove a bunch of unused stuff from plamo

* move phixtral to mlx-lm and out of llms/
2024-02-12 10:51:02 -08:00
Ivan Fioravanti
4576946151
Add checkpoints directory for adapter weights (#431)
* Add checkpoints directory for adapter weights

The code was modified to create a checkpoints directory if it doesn't exist yet. Adapter weights are now saved to this checkpoints directory during the training iterations.
Corrected indentation of Save adapter weights code because it was part of "if eval"

* Fixing a blank added by mistake
2024-02-12 10:50:05 -08:00
Nripesh Niketan
f1ef378a58
Feat: update pre-commit rev (#432) 2024-02-11 07:23:27 -08:00
Awni Hannun
f45a1ab83c
Update a few examples to use compile (#420)
* update a few examples to use compile

* update mnist

* add compile to vae and rename some stuff for simplicity

* update reqs

* use state in eval

* GCN example with RNG + dropout

* add a bit of prefetching
2024-02-08 13:00:41 -08:00
Anchen
da7adae5ec
fix(mlx-m): lazy load hf_olmo (#424) 2024-02-08 09:02:43 -08:00
Markus Enzweiler
9b387007ab
Example of a Convolutional Variational Autoencoder (CVAE) on MNIST (#264)
* initial commit

* style fixes

* update of ACKNOWLEDGMENTS

* fixed comment

* minor refactoring; removed unused imports

* added cifar and cvae to top-level README.md

* removed mention of cuda/mps in argparse

* fixed training status output

* load_weights() with strict=True

* pretrained model update

* fixed imports and style

* requires mlx>=0.0.9

* updated with results using mlx 0.0.9

* removed mention of private repo

* simplify and combine to one file, more consistency with other exmaples

* few more nits

* nits

* spell

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-02-06 20:02:27 -08:00
Long Sha
8071aacd98
fix-mistral-download-link (#418) 2024-02-06 19:56:56 -08:00
Chris McMaster
2303238e44
Update olmo.py (#419)
exit should be imported outside of interactive mode
2024-02-06 16:16:46 -08:00
Anchen
8b77677c05
chore(mlx-lm): add model weight index in save_weights (#413)
* chore(mlx-lm): add model weight index in save_weights

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* chore: save total siZe as param size isntead of file size

* chore: clean up format

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-02-06 05:32:15 -08:00
Anchen
a7d139f484
fix(mlx-lm): olmo 1b model (#417) 2024-02-06 05:27:05 -08:00
Awni Hannun
aa7447efa2
Olmo in MLX LM (#415)
* run olmo

* format
2024-02-05 21:13:49 -08:00
Ivan Fioravanti
7fbca214b1
Add max sequence length argument in lora.py (#408)
A new argument "--max_seq_length" has been added to the command-line parser and passed as a parameter to the main function of the lora.py script. This allows users to specify and control the maximum sequence length during training.
2024-02-04 12:28:21 -08:00
Junyang Lin
9d0dd34403
add qwen2 (#411) 2024-02-04 08:31:38 -08:00
Madroid Ma
ba3a9355d1
LoRA: Remove unnecessary model type judgments (#388)
* LoRA: Remove unnecessary model type judgments

1. Supported models are already checked in the load_model function in utils, no need to repeat the check in lora
2. The checks in lora are not synchronized with those in utils

* LoRA: add LoRA supported models in mlx_lm utils
2024-01-31 11:55:27 -08:00
Anchen
0a49ba0697
fix(mlx-lm): apply lora layer doesn't update the lora weights (#396) 2024-01-31 11:51:26 -08:00
Sugato Ray
ab8bde1590
Add py.typed to support PEP-561 (type-hinting) (#389)
This adds support for type-hinting information as laid in [PEP-561](https://peps.python.org/pep-0561/).
2024-01-30 21:17:38 -08:00
David Koski
f8fadf7a17
Fix token count computation to fix tps measurements (#392) 2024-01-30 11:24:16 -08:00
Anchen
614de6652f
chore(mlx-lm): add reset lora layers helper (#377)
* chore(mlx-lm): add reset lora layers helper

* chore: rename the func

* chore: update docstring

* Update llms/mlx_lm/tuner/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-01-29 20:54:49 -08:00
Ashish
20b969b412
Replace time.time() with time.perf_counter() as it is more suited for benchmarking (#380) 2024-01-26 14:11:38 -08:00
Awni Hannun
5aa652d3c2
remove simplify (#379) 2024-01-26 13:54:49 -08:00
Ashish
0b57f0eae6
Add StableLM-2 1.6B (#378)
* init

* stablelm

* add to readme

* bump version

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-26 10:28:00 -08:00
Anchen
854ad8747a
feat(mlx-lm): add de-quant for fuse.py (#365)
* feat(mlx-lm): add de-quant for fuse

* chore: disable quant in to linear when de-quant enabled

* chore: add better error handling for adapter file not found
2024-01-25 18:59:32 -08:00
Anchen
f51e98fcf1
chore(mlx-lm): truncate the input sentence to max seq len in lora iterate_batches (#373)
* chore(mlx-lm): pass max seq len to evaluate in training loop

* chore: make sure the batch seq not exceed max len

* chore: update comment

* chore: add warning before truncate input
2024-01-25 12:38:04 -08:00
Anchen
b1dec281b3
feat(mlx-lm): add lora hypeparameters in lora layer (#366)
* feat(mlx-lm): add lora hypeparameters in lora layer

* chore: address comments
2024-01-24 08:11:25 -08:00
Anchen
5fc8668a53
fix(mlx-lm): handle legacy quant models (#369) 2024-01-24 07:44:05 -08:00
Anchen
ab91ac1075
chore(mlx-lm): add load model with adapter and fix bug in sample (#360)
* chore: add load model with adapter support and fix bug in sample

* chore: ignore temp during calculating prob in sample
2024-01-23 19:47:39 -08:00
Juarez Bochi
f5b80c95fb
Example reading directly from gguf file (#222)
* Draft of tiny llama from gguf

* Transpose all

* No transposition with new layout

* Read config from gguf

* Create tokenizer from gguf

* move gguf and update to be similar to hf_llm

* change model to HF style + updates to REAMDE

* nits in REAMDE

* nit readme

* only use mlx for metadata

* fix eos/bos tokenizer

* fix tokenization

* quantization runs

* 8-bit works

* tokenizer fix

* bump mlx version

---------

Co-authored-by: Juarez Bochi <juarez.bochi@grammarly.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 15:41:54 -08:00
iLoveBug
40b61c1719
fix the chinese character generation as same as PR #321 (#342)
* fix the chinese character generation as same as PR #321

* reuse the generate logic to utils.py

* format

* verbose defualt

* fix conflicst with colorize and character check

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 12:44:23 -08:00
Awni Hannun
21aa8038fb
MLX LM version bump (#358)
* version bump

* include new package
2024-01-23 09:05:57 -08:00
Anchen
362e88a744
feat: move lora into mlx-lm (#337)
* feat: Add lora and qlora training to mlx-lm


---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 08:44:37 -08:00
Shunta Saito
85c1ff8fd6
Add PLaMo-13B model as an LLM example (#303)
* Convert HF weights of PLaMo and load it to a plamo model in mlx

* Fix model inference part

* Add bos at the beginning of the prompt

* Fix convert.py to copy tokenizer.model into the converted dir

* Use the required insturction format in generate.py when "--instruct" option is specified

* Change filenames and update existing scripts

* Add README

* Add requirements.txt

* Fix plamo.py to stop generation when EOS appears

* Add quantization to convert.py

* Use mlx>=0.0.9 for mx.core.outer() in PLaMo model

* Update acknowledgements.md

* Fix card text in upload_to_hub()

* Not use prompt template when --instruct is not specified

* Ask if you trust_remote_code for loading tokenizer of PLaMo

* Check the user trusts the remote code when converting

* Remove plamo directory

* Update README

* Add PLaMo model file

* Fix the handling of cache in PLaMo and update README

* Ask if trust_remote_code only when the model is PLaMo

* Remove resolve_trust_remote_code from convert.py and use the latest transformers

* Remove code not to add EOS

* Update README to fix an example not to use noncommercial version of the model

* Remove unused imports

* Remove unnecessary description about the instruct model of PLaMo from README

* format, nits in README

* typo

---------

Co-authored-by: Shunta Saito <shunta@mitmul-mbp.local>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 07:17:24 -08:00
Ivan Fioravanti
c45c2311bd
Add colorized output option to generate script (#347)
* Add colorized output option to generate script

Two new functions were added to the script that allow output to be colorized based on the T[0] probability. Changes were made to the `generate_step` function in utils.py to permit colorization. Additionally, an argument for colorization was introduced to the command-line parser.

* Rename 'colorize' parameter with 'return_probability' in generate_step
2024-01-23 05:25:44 -08:00
Sugato Ray
a445ac2895
Update docs with conda install option (#354) 2024-01-22 21:14:48 -08:00
Baptiste Canton
42672f5446
add an option to apply the tokenizer chat template (#338)
* add an option to apply the tokenizer chat template

* fix the option to apply the tokenizer chat template

* better error messages for chat template issues

* apply the chat template by default when possible

* nit in comment'

* rebase

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-22 19:52:42 -08:00
Anchen
30be4c4734
refactor(qwen): moving qwen into mlx-lm (#312)
* refactor(qwen): moving qwen into mlx-lm

* chore: update doc

* chore: fix type hint

* add qwen model support in convert

* chore: fix doc

* chore: only load model in quantize_model

* chore: make the convert script only copy tokenizer files instead of load it and save

* chore: update docstring

* chore: remove unnecessary try catch

* chore: clean up for tokenizer and update  transformers 4.37

* nits in README

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-22 15:00:07 -08:00
Anchen
527cea4027
chore: fix the convert.py script for weights are not sanitized and support quant for non-32 dimensions (#340)
* chore: fix convert script for weights not sanitized and suport quant for non 32 dim

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* chore: fix typo

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-01-19 21:07:21 -08:00
bojanbabic
61297f547b
Missing requirements needed for convert script (#320)
* fix requirements and add eos parameter

* fix black

* address comment

* address comments - remove new arg
2024-01-18 19:04:24 -08:00
Awni Hannun
bcc9fc3581
two minor fixes (#335) 2024-01-18 14:18:13 -08:00
someone
2287294723
fix mlx_lm generator for chinese (#321)
* fix generator for chinese

* add REPLACEMENT_CHAR

---------

Co-authored-by: cg <cg@qq.com>
2024-01-16 07:13:33 -08:00
Awni Hannun
b0870ed679
fix response + bump version (#319) 2024-01-15 11:51:21 -08:00
Anchen
195bec2fa3
feat(mlx_lm): add mixtral support in mlx_lm (#318)
* feat: add mixtral support in mlx_lm

* chore: update doc
2024-01-15 07:18:14 -08:00
Marcel Bischoff
cd3cff0858
Phixtral (#290)
* initial

* file

* remove debug

* Adding README

* typo

* simplify readme

* nits in readmes

---------

Co-authored-by: Marcel Bischoff <marcel.bischoff@awarehq.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-13 08:35:03 -08:00
Anchen
a39b735c3b
chore(mlx-lm): update phi2 model args to sync with hf config format. (#311)
* chore(mlx-lm): update phi2 model args to sync with hf config format

* chore: fix type hint
2024-01-13 07:51:45 -08:00
Pedro Cuenca
ef93979973
Update model card uploaded with converted models (#309) 2024-01-12 13:03:52 -08:00
Angelos Katharopoulos
1fa40067fe
Change tuple type definitions to use Tuple (#308) 2024-01-12 11:15:09 -08:00
Awni Hannun
c6440416a2
Mlx llm package (#301)
* fix converter

* add recursive files

* remove gitignore

* remove gitignore

* add packages properly

* read me update

* remove dup readme

* relative

* fix convert

* fix community name

* fix url

* version
2024-01-12 10:25:56 -08:00
Anchen
6217d7acd0
Delete llms/hf_llm/models/.gitignore (#300) 2024-01-11 16:56:50 -08:00
Anchen
a2402116ae
refactor(hf_llm): moving phi2 example into hf_llm (#293)
* refactor: moving phi2 example into hf_llm

* chore: clean up

* chore: update phi2 model args so it can load args from config

* fix phi2 + nits + readme

* allow any HF repo, update README

* fix bug in llama

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-11 12:29:12 -08:00
Anchen
7380ebfb0d
fix: undefined hf_path (#292) 2024-01-11 05:53:52 -08:00
Konstantin Kerekovski
047d4650c4
Add -local flag to llms/hf_llm/convert.py for reading source HF models from filesystem. (#260)
* * Add --local flag for reading models from filesystem and related code for doing so
* Disable uploading to huggingface if --local flag is set

* Remove code related to .bin files and merge fetch_from_local and fetch_from_hub into one function.

* Update llms/hf_llm/convert.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* format / nits

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-10 19:53:01 -08:00
Alwin Arrasyid
2bbe9d3bd8
fix use of args in generate function (#284) 2024-01-10 08:09:21 -08:00
Awni Hannun
7b258f33ac
Move lora example to use the same model format / conversion as hf_llm (#252)
* huffing face the lora example to allow more models

* fixes

* comments

* more readme nits

* fusion + works better for qlora

* nits'

* comments
2024-01-09 11:14:52 -08:00
Alwin Arrasyid
6e6eff326e
fix: use of undefined args in generate function in phi-2 example (#265) 2024-01-09 06:43:59 -08:00
Anchen
6e5b0de4d3
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253)
* refactor: make the phi2 example can be directly load the model from hf without convert needed

* chore: add super().__init__() for all module, otherwise will cause error in lora
2024-01-08 06:01:23 -08:00
Nino Risteski
9742ad0f51
Update README.md (#248)
fixed a few typos
2024-01-07 20:13:58 -08:00
Nino Risteski
b152d12d7b
Update README.md (#243)
a few typos
2024-01-06 11:44:49 -08:00
Anchen
758f05c09a
refactor: merge deepseek coder example into hf_llm example (#234)
* refactor: merge deepseek coder example into hf_llm example

* remove deepseek example

* chore: fix format in readme

* chore: remove default rope_scaling dict and use get to access type and factor to avoid key error

* Update llms/hf_llm/models.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* chore: fix lint

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-01-06 07:53:46 -08:00
Awni Hannun
cf0ad26a89
force fp16 for quantized models (#240) 2024-01-05 21:29:15 -08:00
Awni Hannun
37b41cec60
Qlora (#219)
qlora
2024-01-04 21:05:59 -08:00
Christian Bieniak
4fa659acbd
Handle receiving 0 tokens gracefully (#231)
* handle 0 tokens gracefully

* Formatting

* Move no token check to statistics section
2024-01-04 19:14:13 -08:00
Andy Peatling
12c9bafbf5
Update README.md to fix --hf-model param call. (#229)
Update `--hf-model` to `--hf-path` since the `--hf-model` param does not exist in convert.py.
2024-01-04 11:53:51 -08:00
Awni Hannun
e14afb3e77
fix to use actual prompt (#227) 2024-01-04 11:12:05 -08:00
Vaibhav Srivastav
f95cf30a31
Fix upload to hub for HF LLMs conversion script. (#221)
* Fix upload to hub snippet.

* Weights -> model.

* reverting last commit.
2024-01-04 06:06:05 -08:00
Awni Hannun
a5d6d0436c
Support Hugging Face models (#215)
* support hf direct models
2024-01-03 15:13:26 -08:00
Daniel Strobusch
1d09c4fecd
keep dtype on model conversion (#186) 2024-01-02 11:20:29 -08:00
Daniel Strobusch
85258b2be7
make parameter naming consistent with other examples. (#214) 2024-01-02 08:18:12 -08:00
Anchen
e632d7aaaa
fix: deepseek coder tokenizer error (#211) 2024-01-01 06:10:37 -08:00
Anchen
ee3c44d231
chore: make the Deepseek example compatible with Yi models. (#205)
* Update convert.py

* Update convert.py

* Update deepseek_coder.py
2023-12-30 06:11:33 -08:00
Anchen
1cdbf9e886
chore: fix the load quantization model for deepseek coder (#203)
* chore: fix the load quantization model

* change to explicitly check for quantization config
2023-12-29 05:25:38 -08:00
Anchen
31ddbd7806
add deepseek coder example (#172)
* feat: add example for deepseek coder

* chore: remove hardcoded rope_scaling_factor

* feat: add quantization support

* chore: update readme

* chore: clean up the rope scalling factor param in create cos sin theta

* feat: add repetition_penalty

* style /consistency changes to ease future integration

* nits in README

* one more typo

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2023-12-28 21:42:22 -08:00
Benjamin Anderson
09566c7257
add speculative decoding example for llama (#149)
* speculative decoding

* add sample 0

* spec decode gives same results as regular decode

* rebase

* use accept reject criteria

* switch to t5

* update readme

* readme nit

* nits

* nits

* nits

---------

Co-authored-by: Benjamin Anderson <benjamin@Benjamins-MBP.lan>
Co-authored-by: Awni Hannun <awni@apple.com>
2023-12-28 15:20:43 -08:00
Sunbir Gill
78d207fe27
Fix generate example in README (#197) 2023-12-27 13:11:10 -08:00
Sushant
a516f4635d
Fixed the return type for the __call__ method in Attention (#190) 2023-12-26 09:32:43 -08:00
Daniel Strobusch
2bd20ef0e0
shard llama model after conversion and unshard on loading (#174) 2023-12-25 11:19:43 -08:00
Yifan
738448c2d4
QWEN: Fix unsupported ScalarType BFloat16 (#187)
Fix unsupported ScalarType BFloat16.
2023-12-25 06:10:01 -08:00
devonthomas35
939086e6a3
Mixtral: Stop at EOS token (#183)
* Stop at EOS token

* Precommit format files

* Fix precommit hooks

* Fix precommit hooks
2023-12-23 21:25:42 -08:00
Daniel Strobusch
848f118ac5
use non-zero exit code on error (#177) 2023-12-23 07:10:13 -08:00
Daniel Strobusch
092e87211e
fix bad convert parameter (#178) 2023-12-23 07:09:49 -08:00
Alvaro Bartolome
f4709cb807
Align CLI args and some smaller fixes (#167)
* Add `.DS_Store` files to `.gitignore`

* Fix variable naming of `config` in `mixtral/convert.py`

* Align CLI args and minor fixes

* standardize

* one more

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2023-12-22 14:34:32 -08:00
Vaibhav Srivastav
0eaa323c10
Fix conversion + inference errors. - Mistral (#176)
* Fix conversion + inference errors.

* wire rope_theta throuugh to nn.RoPE

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2023-12-22 14:10:25 -08:00
Todsaporn Banjerdkit
7ae445f6c7
feat: add mistral tps (#173)
* feat: add mistral tps

* eval params before timing + format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2023-12-22 07:55:57 -08:00
Awni Hannun
3cf436b529
Quantize example (#162)
* testing quantization

* conversion + quantization working

* one config processor

* quantization in mistral / nits in llama

* args for quantization

* llama / mistral conversion in good shape

* phi2 quantized

* mixtral

* qwen conversion
2023-12-21 12:59:37 -08:00
Deven Mistry
6c574dbecf
update path to load weights (#164) 2023-12-21 06:31:17 -08:00
Daniel Strobusch
43b6522af2
rename --model_path to --model-path (#151)
use same argument convention for mistral/mixtral as for llama convert.
2023-12-21 06:28:57 -08:00
Deven Mistry
3efb1cc2cc
fix typo in readme (#163) 2023-12-20 19:47:41 -08:00
Pedro Cuenca
ce30cc3d8f
Use config.json in llama (#159)
* Use config.json in llama

* Fix pop

* Fix convert

* Typo
2023-12-20 10:34:44 -08:00
Awni Hannun
27c0a8c002
Add llms subdir + update README (#145)
* add llms subdir + update README

* nits

* use same pre-commit as mlx

* update readmes a bit

* format
2023-12-20 10:22:25 -08:00
Junyi Mei
62b455f801
Add Qwen example (#134)
* Add qwen model draft

* Add readme and requirements for qwen example

* Add model and tokenizer options

* Fix convert and tokenizer

* some updates / style consistency

* move to llm subdir

* readme nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2023-12-19 13:06:19 -08:00