Shashank
4b2a0df237
adding wwdc25 samples ( #1370 )
2025-06-10 10:23:25 -07:00
Denrei Keith
977cd30242
Update lora README.md ( #1365 )
...
point to the correct repository
https://github.com/ml-explore/mlx-lm
2025-05-01 06:00:14 -07:00
Param Thakkar
4c9f9f9be7
Made llama and mistral files mypy compatible ( #1359 )
...
* Made mypy compatible
* reformatted
* Added more fixes
* Added fixes to speculative-decoding
* Fixes
* fix circle
* revert some stuff
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-04-23 14:23:46 -07:00
Angelos Katharopoulos
c52cc748f8
Distributed FLUX ( #1325 )
2025-03-24 22:16:48 -07:00
Awni Hannun
c243370044
remove mlx lm ( #1353 )
2025-03-18 18:47:55 -07:00
Tingzhen
7ca05d2e51
LoRa/README.md should be --hf-path instead of --hf-repo ( #1350 )
...
Co-authored-by: du tingzhen <dutingzhen@macbookpro.myfiosgateway.com>
2025-03-16 20:02:52 -07:00
Awni Hannun
d9e1d9c0ef
mlx-lm move notice ( #1346 )
...
* mlx-lm move notice
* remove mlx lm tests
2025-03-16 15:14:28 -07:00
Prince Canuma
2fce02acd8
Add support for Gemma3 ( #1336 )
...
* add support for gemma3
* fix model loading
* revert rmsnorm
* revert is sliding pattern
* revert
* add tests
* formatting
* Update llms/mlx_lm/models/gemma3_text.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update llms/mlx_lm/models/gemma3_text.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update llms/mlx_lm/models/gemma3_text.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update llms/mlx_lm/models/gemma3_text.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update llms/mlx_lm/models/gemma3_text.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update llms/mlx_lm/models/gemma3_text.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update llms/mlx_lm/models/gemma3_text.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* fix sliding window mask
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2025-03-13 08:14:25 -07:00
Mirko Nasato
3e5baf583b
Make sure to use UTF-8 when loading tokenizer.json ( #1340 )
2025-03-12 19:17:14 -07:00
Neil Mehta
4c3df00162
make_sampler
creates sampler chain with all sampling parameters (#1330 )
...
* top_p refactor
* top_k and min_p refactor
* Create sampler chain
* Remove unnecessary mx.where
* Use mx.allclose
2025-03-11 13:37:35 -07:00
Awni Hannun
d2e02b3aae
fix mixed quant option ( #1326 )
2025-03-07 08:35:48 -08:00
Awni Hannun
595f5da146
remove lm head if unused ( #1324 )
2025-03-06 15:35:47 -08:00
cavit99
877d2a345b
Change DEFAULT_SEED to None for stochastic generation by default ( #1323 )
...
* Change DEFAULT_SEED to None for stochastic generation by default
* Update llms/mlx_lm/chat.py
* Update llms/mlx_lm/generate.py
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2025-03-06 06:49:35 -08:00
Awni Hannun
32d10036de
fix flaky test ( #1322 )
2025-03-05 14:00:09 -08:00
Gökdeniz Gülmez
e150621095
Adding multiple optimizers to mlx lm ( #1315 )
...
* initial commmit
* adding more customized YAML configuartion
* update YAML example file
* Changed the switch to set opt_class
* removing muon
* using default arguments
* udpate
2025-03-05 13:54:54 -08:00
Gökdeniz Gülmez
56d2db23e1
adding OLMoE architecture ( #1321 )
...
* initial commit
* udpate ACKNOWLEDGMENTS.md
* adding olmoe to training
* clean up
* faster generation
* remove sanitize method
* more clean ups
* adding SwitchGLU
* clean up
* a little faster and adding norm_topk_prob
* formated
2025-03-05 13:46:06 -08:00
Angelos Katharopoulos
e7267d30f8
Distributed support cifar ( #1301 )
2025-03-05 13:33:15 -08:00
Awni Hannun
f621218ff5
Tool use example ( #1316 )
...
* tool use example
* nits
2025-03-04 13:53:20 -08:00
Awni Hannun
65aa2ec849
use a bool mask for attention ( #1319 )
2025-03-04 12:47:32 -08:00
Pierre-Louis
1bc3476a46
chore(lora): Add real-time log buffering fix for nohup execution ( #1311 )
...
* chore(lora): Add real-time log buffering fix for nohup execution
Disable Python stdout buffering to ensure logs appear in nohup.out in real-time instead of only after script completion.
* chore(lora): remove python 3.7+ check
* chore(lora): running pre-commit hook
---------
Co-authored-by: Pierre-Louis Létoquart <randlgint@proton.me>
2025-03-03 06:12:33 -08:00
Shunta Saito
269faa5fa4
Fix plamo2 model to use rms_norm ( #1308 )
...
* Fix plamo2 model to use rms_norm and enable sliding window attention
* Fix missing variable
* Remove sliding window attention impl. cause it should be done by using RotatingKVCache
* Remove unused imports
2025-03-03 06:12:02 -08:00
Awni Hannun
845cd8c01e
support kimi + more options in chat mode ( #1312 )
2025-02-28 11:33:18 -08:00
Awni Hannun
b2108a0de6
Allow mask prompt in config ( #1314 )
2025-02-28 11:33:04 -08:00
madroid
eb73549631
Generate: Support Prefill Response ( #1299 )
...
* Generate: Support Prefill Prompt
python -m mlx_lm.generate \
--model mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-4bit \
--prompt "hello" \
--prefill-prompt "<think>\n"
* Generate: rename prefill-prompt to prefill-response
* nits
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2025-02-27 07:44:00 -08:00
Awni Hannun
00a7379070
Fixes for phi4 mini ( #1305 )
2025-02-26 16:21:54 -08:00
Awni Hannun
0f240a4c7e
Use max tokens from options in mlx_lm evaluate ( #1302 )
2025-02-26 15:46:16 -08:00
Awni Hannun
56e60ad5a6
fix manage for new transformers ( #1304 )
2025-02-26 15:44:57 -08:00
Pedro Cuenca
b7f742ef56
Mixed quant recipes ( #1300 )
...
* Mixed 3/6 and 2/6 recipes based on Alex Barron's
* format / nits
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2025-02-26 11:32:36 -08:00
Shunta Saito
c37e26a1a3
Add plamo-2-1b model ( #1283 )
...
* Add pfnet/plamo-2-1b
* Fix cache.py to support non-top level layers
* Use mlx's BaseModelArgs
* Fix model
* Use sanitize()
* Remove unnecessary changes
* Add plamo2.py
* Apply formatter
* Fix some part
* Allow a cache obj defined externally
* Fix channel first weights to channel last for right use of MLX's conv1d
* Remove unused code part
* Give all inputs when it's the first time call of model
* Fix import
* Include .jsonl files to download from Huggingface hub
* Fix reference to layers
* Remove unnecessary code and add a test for plamo2
* Do not pass mask to prepare_inputs_for_generation
* Fix to use repeat instead of tile
* Add state property to PlamoCache
* Add __iter__ and __next__ methods to PlamoCache
* cleanup
* cleanup
* fix
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2025-02-24 19:24:43 -08:00
Usama Ahmed
09b641aaa7
Fix FutureWarning in torch.load by setting weights_only=True ( #1295 )
2025-02-22 06:08:54 -08:00
Awni Hannun
3d793ecf68
Fix logits processor bugs with spec dec ( #1291 )
...
* Fix logits processor bugs with spec dec
* bump patch
2025-02-20 15:55:55 -08:00
Awni Hannun
85669451d0
Fix num layers in fine tune ( #1294 )
2025-02-20 13:32:01 -08:00
Awni Hannun
1cbf5cdac7
use more standard window strategy ( #1287 )
2025-02-19 06:22:51 -08:00
Matthias Neumayer
96bf37008e
Update README.md to include how to set temperature ( #1280 )
...
* Update README.md to include how to set temperature
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-02-13 19:32:56 -08:00
Awni Hannun
7b07b14e67
add logits processor to spec gen ( #1260 )
2025-02-13 19:19:53 -08:00
Awni Hannun
ec30dc3538
hunyuan finetune ( #1270 )
2025-02-11 16:49:35 -08:00
Awni Hannun
42413c5d85
fix lora timings after validation ( #1278 )
2025-02-11 16:48:55 -08:00
Awni Hannun
f8cbf159e0
fix sharding for more even number of layers ( #1276 )
2025-02-11 16:26:59 -08:00
Awni Hannun
e879ea70e1
fix generation evaluations ( #1277 )
2025-02-11 16:10:30 -08:00
Matt Clayton
3d677f0870
Add "from_draft" to GenerationResponse ( #1272 )
...
* Add from_draft field in GenerationResponse
* Cleanup
* Re-work for minimal changes, add test
* Fix comment
2025-02-11 15:41:02 -08:00
Awni Hannun
bded1a8fcd
fix looping in whisper ( #1273 )
2025-02-10 13:04:35 -08:00
Chime Ogbuji
5865899c81
Completion only fine-tuning of instruction models with collections of HF datasets ( #1103 )
...
- Optional completion only fine-tuning with `--mask-prompt`
- Collections of Hugging Face datasets
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-02-09 20:12:34 -08:00
Sri Harsha Pamu
1ced1b00ca
rm temp argument ( #1267 )
2025-02-09 11:39:11 -08:00
Awni Hannun
f58c7de901
Some improvements to speedup alignment computation in MLX Whisper ( #1259 )
...
* some improvements to speedup alignment computation in MLX Whisper
* fix alignment
2025-02-08 15:47:00 -08:00
Awni Hannun
1503bd4f55
support hunyuan 7b ( #1263 )
2025-02-08 15:46:47 -08:00
Awni Hannun
31611b62d7
Add IBM granite model ( #1265 )
...
* add granite
* add thinking option
2025-02-08 15:46:15 -08:00
Awni Hannun
6120a5f376
Faster DSv2/3 expert score computation ( #1257 )
...
* fix deepseek sharding (#1242 )
* compile and use put along axis in deep seek routing function
2025-02-07 10:24:57 -08:00
Awni Hannun
52c41b5b5a
Fix prompt cache for models without chat template ( #1250 )
...
* fix deepseek sharding (#1242 )
* fix prompt cache with no chat template
2025-02-06 11:10:58 -08:00
Nripesh Niketan
747c08e202
Chore: pre-commit bump ( #1253 )
2025-02-06 09:06:31 -08:00
Pedro Cuenca
e2e5478da5
READMEs: fix typo in link, minor update. ( #1246 )
2025-02-04 11:52:32 -08:00