Commit Graph

664 Commits

Author SHA1 Message Date
Awni Hannun
7900a6c22c fix 2025-02-24 09:10:24 -08:00
Awni Hannun
2edcc0355f cleanup 2025-02-24 09:07:07 -08:00
Awni Hannun
9392bc70f7 cleanup 2025-02-24 08:51:12 -08:00
Shunta Saito
675c322978 Merge remote-tracking branch 'upstream/main' into mitmul/add-plamo2-1b-support 2025-02-24 13:37:43 +09:00
Shunta Saito
fb1559e1f3 Add __iter__ and __next__ methods to PlamoCache 2025-02-23 17:33:19 +09:00
Shunta Saito
e2d9d619c4 Add state property to PlamoCache 2025-02-23 16:06:38 +09:00
Shunta Saito
21c0abaf23 Fix to use repeat instead of tile 2025-02-23 14:54:23 +09:00
Shunta Saito
d7426c7750 Do not pass mask to prepare_inputs_for_generation 2025-02-23 14:47:49 +09:00
Shunta Saito
31225f4960 Remove unnecessary code and add a test for plamo2 2025-02-23 14:43:14 +09:00
Usama Ahmed
09b641aaa7
Fix FutureWarning in torch.load by setting weights_only=True (#1295) 2025-02-22 06:08:54 -08:00
Shunta Saito
d4e688edc3 Fix reference to layers 2025-02-22 19:55:42 +09:00
Awni Hannun
3d793ecf68
Fix logits processor bugs with spec dec (#1291)
* Fix logits processor bugs with spec dec

* bump patch
2025-02-20 15:55:55 -08:00
Awni Hannun
85669451d0
Fix num layers in fine tune (#1294) 2025-02-20 13:32:01 -08:00
Awni Hannun
1cbf5cdac7
use more standard window strategy (#1287) 2025-02-19 06:22:51 -08:00
Shunta Saito
1e75bf184c Include .jsonl files to download from Huggingface hub 2025-02-15 10:08:33 +09:00
Shunta Saito
9f422b4729 Fix import 2025-02-15 07:04:53 +09:00
Shunta Saito
28f3f3adab Give all inputs when it's the first time call of model 2025-02-15 07:04:53 +09:00
Shunta Saito
103c6616c4 Remove unused code part 2025-02-15 07:04:53 +09:00
Shunta Saito
66dd97ed3d Fix channel first weights to channel last for right use of MLX's conv1d 2025-02-15 07:04:53 +09:00
Shunta Saito
81917d41d5 Allow a cache obj defined externally 2025-02-15 07:04:53 +09:00
Shunta Saito
00d13ebd40 Fix some part 2025-02-15 07:04:53 +09:00
Shunta Saito
fb5e225523 Apply formatter 2025-02-15 07:04:53 +09:00
Shunta Saito
07cf4336b3 Add plamo2.py 2025-02-15 07:04:53 +09:00
Shunta Saito
ebea6928a3 Remove unnecessary changes 2025-02-15 07:04:53 +09:00
Shunta Saito
72269c306c Use sanitize() 2025-02-15 07:04:53 +09:00
Shunta Saito
197fd6aad8 Fix model 2025-02-15 07:04:53 +09:00
Shunta Saito
40c7ce8048 Use mlx's BaseModelArgs 2025-02-15 07:04:53 +09:00
Shunta Saito
9a6e6541de Fix cache.py to support non-top level layers 2025-02-15 07:04:53 +09:00
Shunta Saito
58686bbcac Add pfnet/plamo-2-1b 2025-02-15 07:04:53 +09:00
Matthias Neumayer
96bf37008e
Update README.md to include how to set temperature (#1280)
* Update README.md to include how to set temperature

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-02-13 19:32:56 -08:00
Awni Hannun
7b07b14e67
add logits processor to spec gen (#1260) 2025-02-13 19:19:53 -08:00
Awni Hannun
ec30dc3538
hunyuan finetune (#1270) 2025-02-11 16:49:35 -08:00
Awni Hannun
42413c5d85
fix lora timings after validation (#1278) 2025-02-11 16:48:55 -08:00
Awni Hannun
f8cbf159e0
fix sharding for more even number of layers (#1276) 2025-02-11 16:26:59 -08:00
Awni Hannun
e879ea70e1
fix generation evaluations (#1277) 2025-02-11 16:10:30 -08:00
Matt Clayton
3d677f0870
Add "from_draft" to GenerationResponse (#1272)
* Add from_draft field in GenerationResponse

* Cleanup

* Re-work for minimal changes, add test

* Fix comment
2025-02-11 15:41:02 -08:00
Awni Hannun
bded1a8fcd
fix looping in whisper (#1273) 2025-02-10 13:04:35 -08:00
Chime Ogbuji
5865899c81
Completion only fine-tuning of instruction models with collections of HF datasets (#1103)
- Optional completion only fine-tuning with `--mask-prompt`
- Collections of Hugging Face datasets

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-02-09 20:12:34 -08:00
Sri Harsha Pamu
1ced1b00ca
rm temp argument (#1267) 2025-02-09 11:39:11 -08:00
Awni Hannun
f58c7de901
Some improvements to speedup alignment computation in MLX Whisper (#1259)
* some improvements to speedup alignment computation in MLX Whisper

* fix alignment
2025-02-08 15:47:00 -08:00
Awni Hannun
1503bd4f55
support hunyuan 7b (#1263) 2025-02-08 15:46:47 -08:00
Awni Hannun
31611b62d7
Add IBM granite model (#1265)
* add granite

* add thinking option
2025-02-08 15:46:15 -08:00
Awni Hannun
6120a5f376
Faster DSv2/3 expert score computation (#1257)
* fix deepseek sharding (#1242)

* compile and use put along axis in deep seek routing function
2025-02-07 10:24:57 -08:00
Awni Hannun
52c41b5b5a
Fix prompt cache for models without chat template (#1250)
* fix deepseek sharding (#1242)

* fix prompt cache with no chat template
2025-02-06 11:10:58 -08:00
Nripesh Niketan
747c08e202
Chore: pre-commit bump (#1253) 2025-02-06 09:06:31 -08:00
Pedro Cuenca
e2e5478da5
READMEs: fix typo in link, minor update. (#1246) 2025-02-04 11:52:32 -08:00
Awni Hannun
21d0ab6e8a
fix deepseek sharding (#1242) 2025-02-03 16:59:50 -08:00
Gökdeniz Gülmez
0989c073b0
Optimizations for mamba1 (#1213)
* added mx.einsum() operations: before: 41.293 tokens-per-sec, after: 57.822 tokens-per-sec

* Fused Operations in delta, B, C = ... :. Before: 57.822 tokens-per-sec, after: 83.890 tokens-per-sec

* Pre-computing A_log. After: 83.890 tokens-per-sec, before: 85.848 tokens-per-sec

* Update MambaBlock, Batched Input Processing, Improved Cache Handling, Pre-computed Constants, Cleaner State Management, Explicit Return Values:. Before: 82.442 tokens-per-sec, after: 129.130 tokens-per-sec.

* cleaning up and adding apple copyright to helium modelfile

* update Copyright to this year

* nits + even faster

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2025-02-03 13:36:08 -08:00
Awni Hannun
d9924d08d1
Fix no validation in lora (#1241) 2025-02-03 09:55:24 -08:00
Awni Hannun
9c2ef38d4d
only download local shard (#1240) 2025-02-02 13:58:44 -08:00