mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-08-29 13:01:53 +08:00

Author	SHA1	Message	Date
Awni Hannun	7900a6c22c	fix	2025-02-24 09:10:24 -08:00
Awni Hannun	2edcc0355f	cleanup	2025-02-24 09:07:07 -08:00
Awni Hannun	9392bc70f7	cleanup	2025-02-24 08:51:12 -08:00
Shunta Saito	fb1559e1f3	Add __iter__ and __next__ methods to PlamoCache	2025-02-23 17:33:19 +09:00
Shunta Saito	e2d9d619c4	Add state property to PlamoCache	2025-02-23 16:06:38 +09:00
Shunta Saito	21c0abaf23	Fix to use repeat instead of tile	2025-02-23 14:54:23 +09:00
Shunta Saito	d7426c7750	Do not pass mask to prepare_inputs_for_generation	2025-02-23 14:47:49 +09:00
Shunta Saito	31225f4960	Remove unnecessary code and add a test for plamo2	2025-02-23 14:43:14 +09:00
Shunta Saito	d4e688edc3	Fix reference to layers	2025-02-22 19:55:42 +09:00
Shunta Saito	9f422b4729	Fix import	2025-02-15 07:04:53 +09:00
Shunta Saito	28f3f3adab	Give all inputs when it's the first time call of model	2025-02-15 07:04:53 +09:00
Shunta Saito	103c6616c4	Remove unused code part	2025-02-15 07:04:53 +09:00
Shunta Saito	66dd97ed3d	Fix channel first weights to channel last for right use of MLX's conv1d	2025-02-15 07:04:53 +09:00
Shunta Saito	81917d41d5	Allow a cache obj defined externally	2025-02-15 07:04:53 +09:00
Shunta Saito	00d13ebd40	Fix some part	2025-02-15 07:04:53 +09:00
Shunta Saito	fb5e225523	Apply formatter	2025-02-15 07:04:53 +09:00
Shunta Saito	07cf4336b3	Add plamo2.py	2025-02-15 07:04:53 +09:00
Shunta Saito	ebea6928a3	Remove unnecessary changes	2025-02-15 07:04:53 +09:00
Shunta Saito	197fd6aad8	Fix model	2025-02-15 07:04:53 +09:00
Shunta Saito	40c7ce8048	Use mlx's BaseModelArgs	2025-02-15 07:04:53 +09:00
Shunta Saito	9a6e6541de	Fix cache.py to support non-top level layers	2025-02-15 07:04:53 +09:00
Shunta Saito	58686bbcac	Add pfnet/plamo-2-1b	2025-02-15 07:04:53 +09:00
Awni Hannun	f8cbf159e0	fix sharding for more even number of layers (#1276 )	2025-02-11 16:26:59 -08:00
Awni Hannun	1503bd4f55	support hunyuan 7b (#1263 )	2025-02-08 15:46:47 -08:00
Awni Hannun	31611b62d7	Add IBM granite model (#1265 ) * add granite * add thinking option	2025-02-08 15:46:15 -08:00
Awni Hannun	6120a5f376	Faster DSv2/3 expert score computation (#1257 ) * fix deepseek sharding (#1242) * compile and use put along axis in deep seek routing function	2025-02-07 10:24:57 -08:00
Awni Hannun	52c41b5b5a	Fix prompt cache for models without chat template (#1250 ) * fix deepseek sharding (#1242) * fix prompt cache with no chat template	2025-02-06 11:10:58 -08:00
Awni Hannun	21d0ab6e8a	fix deepseek sharding (#1242 )	2025-02-03 16:59:50 -08:00
Gökdeniz Gülmez	0989c073b0	Optimizations for mamba1 (#1213 ) * added mx.einsum() operations: before: 41.293 tokens-per-sec, after: 57.822 tokens-per-sec * Fused Operations in delta, B, C = ... :. Before: 57.822 tokens-per-sec, after: 83.890 tokens-per-sec * Pre-computing A_log. After: 83.890 tokens-per-sec, before: 85.848 tokens-per-sec * Update MambaBlock, Batched Input Processing, Improved Cache Handling, Pre-computed Constants, Cleaner State Management, Explicit Return Values:. Before: 82.442 tokens-per-sec, after: 129.130 tokens-per-sec. * cleaning up and adding apple copyright to helium modelfile * update Copyright to this year * nits + even faster --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2025-02-03 13:36:08 -08:00
Awni Hannun	9c2ef38d4d	only download local shard (#1240 )	2025-02-02 13:58:44 -08:00
Awni Hannun	e8afb59de4	better overflow correction (#1229 )	2025-01-28 14:37:30 -08:00
Gökdeniz Gülmez	77faa14ba4	adding support for kyutai's helium (#1208 ) * initial commit * adding helium into training * Update ACKNOWLEDGMENTS.md * nits * nits * fixes / nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-01-26 07:19:07 -08:00
Awni Hannun	9a3ddc3e65	some fixes for pipeline parallel deep seek r1 (#1216 )	2025-01-21 19:40:29 -08:00
Awni Hannun	50f0a7f6d9	add internlm3 (#1206 )	2025-01-15 14:55:41 -08:00
Awni Hannun	c117af83b8	fix gpt bigcode (#1204 )	2025-01-13 10:22:32 -08:00
Prince Canuma	bf2da36fc6	Fix Cohere2: mask shape error (long context) (#1202 ) * fix mask shape error (long context) * Update llms/mlx_lm/models/cohere2.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * revert layer_idx * black formatting * Update cohere2.py * format --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>	2025-01-12 12:58:08 -08:00
Awni Hannun	5cae0a60e6	deepseek v3 model with pipeline parallelism (#1191 ) * deepseekv3 * use upload_large_file instead of deprecated multi comit * add pipeline generation and example * comment * get fp16 working * use mlx==0.22	2025-01-09 15:55:53 -08:00
Alex Barron	d4ef909d4a	Length masking for batch inputs (#1173 ) * length masking * add mask to mlx_lm model interface * remove lengths * fix test: * comment + fix	2024-12-18 19:43:52 -08:00
Prince Canuma	dfa4dd6c93	Add support for cohere2 (#1157 ) * add support for cohere2 * revert to act_fn to silu * fix tests and sliding window attention * add tests * add to tuner * fix sliding window * add coauthor :) Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com> * Add rotating kvcache to save space * some nits * style * nits --------- Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com> Co-authored-by: N8 <n8@n8programs.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-12-16 08:01:03 -08:00
n8programs	5687d5b99b	Adds EXAONE architecture. (#1145 ) * Adds EXAONE architecture. * nits + format * format * clean up and fix rope * clean up and fix rope --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-12-09 07:58:25 -08:00
Awni Hannun	8801beb66f	Add olmo2 (#1128 ) * add olmo2 * add olmo2	2024-12-02 11:42:58 -08:00
Awni Hannun	004eb4cc9d	Tencent HunYuan MOE model (#1100 ) * hunyuan * fix * format str * default trust remote code for tokenizer, allow system prompt to be configurable	2024-11-23 11:06:26 -08:00
Angelos Katharopoulos	ed9e81dd58	Fix rotating kv cache size (#1093 )	2024-11-05 10:24:24 -08:00
ilyasch2	3b526f0aa1	Add support for falcon-mamba (#1074 ) * Add support for falcon-mamba * nits * nit --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-11-04 12:23:30 -08:00
Alex Barron	85ffd2c96a	Quantized KV Cache (#1075 ) * add QuantizedKVCache * simplify * add tests * single sdpa function * fix sed * in place * fix tests * support different k and v head dims	2024-10-31 16:59:52 -07:00
Awni Hannun	9000e280ae	fix mamba models conversion (#1065 )	2024-10-22 15:44:08 -07:00
Awni Hannun	66e7bcb886	override dtype with quant (#1062 )	2024-10-22 09:56:45 -07:00
Awni Hannun	8dca1a2f60	Tokenizer updates + tests (#1024 ) * tokenizer updates + tests * nit * add can_trim_prompt_cache * nits	2024-10-14 10:48:46 -07:00
Shunta Saito	7612c646f3	Fix PLaMo model to support Grouped Query Attention (#1037 )	2024-10-12 15:26:50 -07:00
Awni Hannun	fca087be49	More cache improvements (#1015 ) * fix rotating kv cache for chat use case * reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat * nit in chat * fix tests * fix tests * fix tests * docs * chat command * comments + docs * Define meta_state on all Cache implementations * fixes + trim_prompt_cache api * fix default model --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-10-07 20:45:51 -07:00

1 2 3

132 Commits