Commit Graph

668 Commits

Author SHA1 Message Date
Awni Hannun
21d0ab6e8a
fix deepseek sharding (#1242) 2025-02-03 16:59:50 -08:00
Gökdeniz Gülmez
0989c073b0
Optimizations for mamba1 (#1213)
* added mx.einsum() operations: before: 41.293 tokens-per-sec, after: 57.822 tokens-per-sec

* Fused Operations in delta, B, C = ... :. Before: 57.822 tokens-per-sec, after: 83.890 tokens-per-sec

* Pre-computing A_log. After: 83.890 tokens-per-sec, before: 85.848 tokens-per-sec

* Update MambaBlock, Batched Input Processing, Improved Cache Handling, Pre-computed Constants, Cleaner State Management, Explicit Return Values:. Before: 82.442 tokens-per-sec, after: 129.130 tokens-per-sec.

* cleaning up and adding apple copyright to helium modelfile

* update Copyright to this year

* nits + even faster

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2025-02-03 13:36:08 -08:00
Awni Hannun
d9924d08d1
Fix no validation in lora (#1241) 2025-02-03 09:55:24 -08:00
Awni Hannun
9c2ef38d4d
only download local shard (#1240) 2025-02-02 13:58:44 -08:00
Awni Hannun
e8afb59de4
better overflow correction (#1229) 2025-01-28 14:37:30 -08:00
Anchen
7a83077cd7
chore(mlx-lm): support text type content in messages (#1225)
* chore(mlx-lm): support text type content

* chore: optimize the messagef content processing

* nits + format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-27 17:13:50 -08:00
Awni Hannun
f44a52e2dc
batched min p and fix spec gen sampling (#1222) 2025-01-27 15:40:31 -08:00
Gökdeniz Gülmez
77faa14ba4
adding support for kyutai's helium (#1208)
* initial commit

* adding helium into training

* Update ACKNOWLEDGMENTS.md

* nits

* nits

* fixes / nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-26 07:19:07 -08:00
Awni Hannun
9a3ddc3e65
some fixes for pipeline parallel deep seek r1 (#1216) 2025-01-21 19:40:29 -08:00
Victor Nogueira
df1406735b
Fix dataset variable name, in datasets.py (#1212) 2025-01-21 14:12:43 -08:00
Jarrett
07f88f8057
fix(lora): add back store_true default args (#1205) 2025-01-16 11:15:42 -08:00
Awni Hannun
50f0a7f6d9
add internlm3 (#1206) 2025-01-15 14:55:41 -08:00
Ivan Fioravanti
6ae6c72c2e
reduction moved to CPU in case of distributed training (#1200) 2025-01-14 17:20:42 -08:00
Awni Hannun
c117af83b8
fix gpt bigcode (#1204) 2025-01-13 10:22:32 -08:00
Chime Ogbuji
0228c46434
Custom local dataset features (#1085)
* Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats.

* Persist configured prompt/completion key

* rebase + nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-13 10:01:18 -08:00
Prince Canuma
bf2da36fc6
Fix Cohere2: mask shape error (long context) (#1202)
* fix mask shape error (long context)

* Update llms/mlx_lm/models/cohere2.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* revert layer_idx

* black formatting

* Update cohere2.py

* format

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-12 12:58:08 -08:00
Xingjun.Wang
514502da22
Support snapshot_download for ModelScope (#1194)
* add MLX_USE_MODELSCOPE env

* update

* update snapshot_download

* update

* remove modelscope dependency and add import check

* update

* nits

* fix

---------

Co-authored-by: wangxingjun778 <jason@U-C7X6TX5G-2239.local>
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-10 15:29:34 -08:00
Awni Hannun
93c5cfd781
Add a speculative decoding generator (#1155)
* add a speculative decoding generator

* fix

* fixes

* optional kwarg pop
2025-01-10 15:27:08 -08:00
Awni Hannun
5cae0a60e6
deepseek v3 model with pipeline parallelism (#1191)
* deepseekv3

* use upload_large_file instead of deprecated multi comit

* add pipeline generation and example

* comment

* get fp16 working

* use mlx==0.22
2025-01-09 15:55:53 -08:00
Jarrett
40b88eff48
fix(lora): config yaml & arg default merge bug (#1196) 2025-01-09 11:33:54 -08:00
Pedro Cuenca
b8f0cacfa8
Use upload_large_folder (#1193) 2025-01-07 09:18:31 -08:00
Awni Hannun
9183fe8b6d
fix (#1192) 2025-01-06 10:12:07 -08:00
Chime Ogbuji
f2619f507c
Add support for fewshot and apply chat template lm_eval functionality (#1180)
* Add support for multiturn fewshot examples and chat templates

Added two new arguments to the evaluation script: `--fewshot-as-multiturn` and `--apply-chat-template` which correspond to lm_eval options of similar names and are very often used to ensure apples-to-apples comparisons of lm_evaluation results

* Add HF overrides for methods needed by added options

* don't add duplicate bos

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-06 07:58:43 -08:00
Angelos Katharopoulos
25ec2d8c44
Change the eos-token argument for mlx_lm.generate (#1176) 2025-01-05 22:26:05 -08:00
Awni Hannun
c4833a2f55
fix encoding with special tokens + chat template (#1189) 2025-01-03 10:50:59 -08:00
Ivan Fioravanti
3a58c36109
Improvements to mlx_lm.manage (#1178)
* improvements to manage. Default value is N and size added to deletion confirmation.

* Fixing case for no case

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-01 07:25:57 -08:00
Alex Barron
d4ef909d4a
Length masking for batch inputs (#1173)
* length masking

* add mask to mlx_lm model interface

* remove lengths

* fix test:

* comment + fix
2024-12-18 19:43:52 -08:00
Awni Hannun
db109184b7
Fix no template prompt + top_k sampling (#1166)
* fix no template prompt

* add top_k sampling

* fix chinese
2024-12-18 18:46:50 -08:00
Billel Mokeddem
845efddc8c
Fix decoding manually added tokens (#1164)
* Fix decoding manually added tokens

* fix + test

* nit

* nit

* no lag bpe

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-17 09:54:29 -08:00
Prince Canuma
dfa4dd6c93
Add support for cohere2 (#1157)
* add support for cohere2

* revert to act_fn to silu

* fix tests and sliding window attention

* add tests

* add to tuner

* fix sliding window

* add coauthor :)

Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>

* Add rotating kvcache to save space

* some nits

* style

* nits

---------

Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>
Co-authored-by: N8 <n8@n8programs.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-16 08:01:03 -08:00
Ikko Eltociear Ashimine
fc0674d2d8
chore: update evaluate.py (#1159)
occurence -> occurrence
2024-12-15 06:06:29 -08:00
Awni Hannun
9f2ea5892e
Bpe stream without space (#1154)
* bpe streaming detokenization without space

* version bump
2024-12-12 13:13:50 -08:00
Awni Hannun
2ba0e36683
[mlx-lm] Use top p in server (#1144)
* use top p in server

* couple other fixes
2024-12-12 11:12:21 -08:00
Angelos Katharopoulos
19abf3dcaa
Replace unicode errors instead of raising exception (#1146) 2024-12-12 11:10:41 -08:00
madroid
06af3c9b0e
Add finish_reason in GenerationResponse (#1153) 2024-12-12 10:37:40 -08:00
Awni Hannun
77b42b7c8b
fix llava (#1149) 2024-12-12 10:37:26 -08:00
Alex Barron
135c5818c1
Fix max_tokens (#1148) 2024-12-10 11:26:04 -08:00
madroid
12083c4b7e
Support for multiple EOS tokens (#1141)
* Support for multiple EOS tokens

* Change _eos_token_ids type from list to set

* Remove model_config & add eos_token_id

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 08:53:58 -08:00
n8programs
5687d5b99b
Adds EXAONE architecture. (#1145)
* Adds EXAONE architecture.

* nits + format

* format

* clean up and fix rope

* clean up and fix rope

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 07:58:25 -08:00
hehua2008
893b3f085e
Change Flux default max_shift to 1.15 to match the official one (#1137) 2024-12-08 23:29:48 -08:00
Peter Sibley
ed91bbc4dc
Fix final message at end of flux training (#1143) 2024-12-08 23:01:53 -08:00
hehua2008
1fd6aae871
Fix flux training with batch size (#1135)
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-12-08 22:09:04 -08:00
Alex Barron
2211b27388
Mixed Quantizations (#1132)
* saving/loading mixed quantizations

* comment

* add bits per weight

* more concise bpw

* count bias too
2024-12-08 14:21:50 -08:00
Alex Barron
cd8cf28c39
mlx_lm.evaluate (#1140)
* Add evaluation script

* only write top level results

* add lm eval version

* typo

* create output dir

* relative import

* comment

---------

Co-authored-by: David Grangier <dgrangier@users.noreply.github.com>
2024-12-08 12:20:10 -08:00
vb
1727959a27
Add mentions of MLX-my-repo. (#1129)
* Add mentions of MLX-my-repo.

* simplify

* move

* move

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-03 19:21:39 -08:00
Awni Hannun
1963df8565
Allow prompt callback to generate_step (#1133)
* allow prompt callback and use in cache_prompt

* nit

* comments

* bump version
2024-12-03 16:17:14 -08:00
sakares saengkaew
0ca162cfb2
Fix data_iter in prepare_dataset from speechcommands example (#1113) 2024-12-02 23:56:07 -08:00
Angelos Katharopoulos
eb9277f574
Allow loading from diffusers ckpt (#1117) 2024-12-02 13:15:50 -08:00
hehua2008
2a9294a5f0
Fix bug in FluxSampler.timesteps method (#1131) 2024-12-02 13:15:19 -08:00
Awni Hannun
8801beb66f
Add olmo2 (#1128)
* add olmo2

* add olmo2
2024-12-02 11:42:58 -08:00