Awni Hannun 
							
						 
					 
					
						
						
							
						
						65b792d7c0 
					 
					
						
						
							
							fix lazy load  
						
						
						
						
					 
					
						2025-02-06 07:28:59 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						617f9289b9 
					 
					
						
						
							
							Make the chat distributed  
						
						
						
						
					 
					
						2025-02-06 07:28:59 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						026362e0f8 
					 
					
						
						
							
							Remove async eval and add sequential load  
						
						
						
						
					 
					
						2025-02-06 07:28:58 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						a0ce0594f6 
					 
					
						
						
							
							Temporarily remove async_eval  
						
						
						
						
					 
					
						2025-02-06 07:28:03 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						d77840207c 
					 
					
						
						
							
							Start distributed inference for llama models  
						
						
						
						
					 
					
						2025-02-06 07:28:03 -08:00 
						 
				 
			
				
					
						
							
							
								Pedro Cuenca 
							
						 
					 
					
						
						
							
						
						e2e5478da5 
					 
					
						
						
							
							READMEs: fix typo in link, minor update. ( #1246 )  
						
						
						
						
					 
					
						2025-02-04 11:52:32 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						21d0ab6e8a 
					 
					
						
						
							
							fix deepseek sharding ( #1242 )  
						
						
						
						
					 
					
						2025-02-03 16:59:50 -08:00 
						 
				 
			
				
					
						
							
							
								Gökdeniz Gülmez 
							
						 
					 
					
						
						
							
						
						0989c073b0 
					 
					
						
						
							
							Optimizations for mamba1 ( #1213 )  
						
						... 
						
						
						
						* added mx.einsum() operations: before: 41.293 tokens-per-sec, after: 57.822 tokens-per-sec
* Fused Operations in delta, B, C = ... :. Before: 57.822 tokens-per-sec, after: 83.890 tokens-per-sec
* Pre-computing A_log. After: 83.890 tokens-per-sec, before: 85.848 tokens-per-sec
* Update MambaBlock, Batched Input Processing, Improved Cache Handling, Pre-computed Constants, Cleaner State Management, Explicit Return Values:. Before: 82.442 tokens-per-sec, after: 129.130 tokens-per-sec.
* cleaning up and adding apple copyright to helium modelfile
* update Copyright to this year
* nits + even faster
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com > 
						
						
					 
					
						2025-02-03 13:36:08 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d9924d08d1 
					 
					
						
						
							
							Fix no validation in lora ( #1241 )  
						
						
						
						
					 
					
						2025-02-03 09:55:24 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9c2ef38d4d 
					 
					
						
						
							
							only download local shard ( #1240 )  
						
						
						
						
					 
					
						2025-02-02 13:58:44 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e8afb59de4 
					 
					
						
						
							
							better overflow correction ( #1229 )  
						
						
						
						
					 
					
						2025-01-28 14:37:30 -08:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						7a83077cd7 
					 
					
						
						
							
							chore(mlx-lm): support text type content in messages ( #1225 )  
						
						... 
						
						
						
						* chore(mlx-lm): support text type content
* chore: optimize the messagef content processing
* nits + format
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-01-27 17:13:50 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f44a52e2dc 
					 
					
						
						
							
							batched min p and fix spec gen sampling ( #1222 )  
						
						
						
						
					 
					
						2025-01-27 15:40:31 -08:00 
						 
				 
			
				
					
						
							
							
								Gökdeniz Gülmez 
							
						 
					 
					
						
						
							
						
						77faa14ba4 
					 
					
						
						
							
							adding support for kyutai's helium ( #1208 )  
						
						... 
						
						
						
						* initial commit
* adding helium into training
* Update ACKNOWLEDGMENTS.md
* nits
* nits
* fixes / nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-01-26 07:19:07 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9a3ddc3e65 
					 
					
						
						
							
							some fixes for pipeline parallel deep seek r1 ( #1216 )  
						
						
						
						
					 
					
						2025-01-21 19:40:29 -08:00 
						 
				 
			
				
					
						
							
							
								Victor Nogueira 
							
						 
					 
					
						
						
							
						
						df1406735b 
					 
					
						
						
							
							Fix dataset variable name, in datasets.py ( #1212 )  
						
						
						
						
					 
					
						2025-01-21 14:12:43 -08:00 
						 
				 
			
				
					
						
							
							
								Jarrett 
							
						 
					 
					
						
						
							
						
						07f88f8057 
					 
					
						
						
							
							fix(lora): add back store_true default args ( #1205 )  
						
						
						
						
					 
					
						2025-01-16 11:15:42 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						50f0a7f6d9 
					 
					
						
						
							
							add internlm3 ( #1206 )  
						
						
						
						
					 
					
						2025-01-15 14:55:41 -08:00 
						 
				 
			
				
					
						
							
							
								Ivan Fioravanti 
							
						 
					 
					
						
						
							
						
						6ae6c72c2e 
					 
					
						
						
							
							reduction moved to CPU in case of distributed training ( #1200 )  
						
						
						
						
					 
					
						2025-01-14 17:20:42 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c117af83b8 
					 
					
						
						
							
							fix gpt bigcode ( #1204 )  
						
						
						
						
					 
					
						2025-01-13 10:22:32 -08:00 
						 
				 
			
				
					
						
							
							
								Chime Ogbuji 
							
						 
					 
					
						
						
							
						
						0228c46434 
					 
					
						
						
							
							Custom local dataset features ( #1085 )  
						
						... 
						
						
						
						* Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats.
* Persist configured prompt/completion key
* rebase + nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-01-13 10:01:18 -08:00 
						 
				 
			
				
					
						
							
							
								Prince Canuma 
							
						 
					 
					
						
						
							
						
						bf2da36fc6 
					 
					
						
						
							
							Fix Cohere2: mask shape error (long context) ( #1202 )  
						
						... 
						
						
						
						* fix mask shape error (long context)
* Update llms/mlx_lm/models/cohere2.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* revert layer_idx
* black formatting
* Update cohere2.py
* format
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-01-12 12:58:08 -08:00 
						 
				 
			
				
					
						
							
							
								Xingjun.Wang 
							
						 
					 
					
						
						
							
						
						514502da22 
					 
					
						
						
							
							Support snapshot_download for ModelScope  ( #1194 )  
						
						... 
						
						
						
						* add MLX_USE_MODELSCOPE env
* update
* update snapshot_download
* update
* remove modelscope dependency and add import check
* update
* nits
* fix
---------
Co-authored-by: wangxingjun778 <jason@U-C7X6TX5G-2239.local >
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-01-10 15:29:34 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						93c5cfd781 
					 
					
						
						
							
							Add a speculative decoding generator ( #1155 )  
						
						... 
						
						
						
						* add a speculative decoding generator
* fix
* fixes
* optional kwarg pop 
						
						
					 
					
						2025-01-10 15:27:08 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5cae0a60e6 
					 
					
						
						
							
							deepseek v3 model with pipeline parallelism ( #1191 )  
						
						... 
						
						
						
						* deepseekv3
* use upload_large_file instead of deprecated multi comit
* add pipeline generation and example
* comment
* get fp16 working
* use mlx==0.22 
						
						
					 
					
						2025-01-09 15:55:53 -08:00 
						 
				 
			
				
					
						
							
							
								Jarrett 
							
						 
					 
					
						
						
							
						
						40b88eff48 
					 
					
						
						
							
							fix(lora): config yaml & arg default merge bug ( #1196 )  
						
						
						
						
					 
					
						2025-01-09 11:33:54 -08:00 
						 
				 
			
				
					
						
							
							
								Pedro Cuenca 
							
						 
					 
					
						
						
							
						
						b8f0cacfa8 
					 
					
						
						
							
							Use upload_large_folder ( #1193 )  
						
						
						
						
					 
					
						2025-01-07 09:18:31 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9183fe8b6d 
					 
					
						
						
							
							fix ( #1192 )  
						
						
						
						
					 
					
						2025-01-06 10:12:07 -08:00 
						 
				 
			
				
					
						
							
							
								Chime Ogbuji 
							
						 
					 
					
						
						
							
						
						f2619f507c 
					 
					
						
						
							
							Add support for fewshot and apply chat template lm_eval functionality ( #1180 )  
						
						... 
						
						
						
						* Add support for multiturn fewshot examples and chat templates
Added two new arguments to the evaluation script: `--fewshot-as-multiturn` and `--apply-chat-template` which correspond to lm_eval options of similar names and are very often used to ensure apples-to-apples comparisons of lm_evaluation results
* Add HF overrides for methods needed by added options
* don't add duplicate bos
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-01-06 07:58:43 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						25ec2d8c44 
					 
					
						
						
							
							Change the eos-token argument for mlx_lm.generate ( #1176 )  
						
						
						
						
					 
					
						2025-01-05 22:26:05 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c4833a2f55 
					 
					
						
						
							
							fix encoding with special tokens + chat template ( #1189 )  
						
						
						
						
					 
					
						2025-01-03 10:50:59 -08:00 
						 
				 
			
				
					
						
							
							
								Ivan Fioravanti 
							
						 
					 
					
						
						
							
						
						3a58c36109 
					 
					
						
						
							
							Improvements to mlx_lm.manage ( #1178 )  
						
						... 
						
						
						
						* improvements to manage. Default value is N and size added to deletion confirmation.
* Fixing case for no case
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-01-01 07:25:57 -08:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						d4ef909d4a 
					 
					
						
						
							
							Length masking for batch inputs ( #1173 )  
						
						... 
						
						
						
						* length masking
* add mask to mlx_lm model interface
* remove lengths
* fix test:
* comment + fix 
						
						
					 
					
						2024-12-18 19:43:52 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						db109184b7 
					 
					
						
						
							
							Fix no template prompt + top_k sampling ( #1166 )  
						
						... 
						
						
						
						* fix no template prompt
* add top_k sampling
* fix chinese 
						
						
					 
					
						2024-12-18 18:46:50 -08:00 
						 
				 
			
				
					
						
							
							
								Billel Mokeddem 
							
						 
					 
					
						
						
							
						
						845efddc8c 
					 
					
						
						
							
							Fix decoding manually added tokens ( #1164 )  
						
						... 
						
						
						
						* Fix decoding manually added tokens
* fix + test
* nit
* nit
* no lag bpe
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-12-17 09:54:29 -08:00 
						 
				 
			
				
					
						
							
							
								Prince Canuma 
							
						 
					 
					
						
						
							
						
						dfa4dd6c93 
					 
					
						
						
							
							Add support for cohere2 ( #1157 )  
						
						... 
						
						
						
						* add support for cohere2
* revert to act_fn to silu
* fix tests and sliding window attention
* add tests
* add to tuner
* fix sliding window
* add coauthor :)
Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com >
* Add rotating kvcache to save space
* some nits
* style
* nits
---------
Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com >
Co-authored-by: N8 <n8@n8programs.com >
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-12-16 08:01:03 -08:00 
						 
				 
			
				
					
						
							
							
								Ikko Eltociear Ashimine 
							
						 
					 
					
						
						
							
						
						fc0674d2d8 
					 
					
						
						
							
							chore: update evaluate.py ( #1159 )  
						
						... 
						
						
						
						occurence -> occurrence 
						
						
					 
					
						2024-12-15 06:06:29 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9f2ea5892e 
					 
					
						
						
							
							Bpe stream without space ( #1154 )  
						
						... 
						
						
						
						* bpe streaming detokenization without space
* version bump 
						
						
					 
					
						2024-12-12 13:13:50 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2ba0e36683 
					 
					
						
						
							
							[mlx-lm] Use top p in server ( #1144 )  
						
						... 
						
						
						
						* use top p in server
* couple other fixes 
						
						
					 
					
						2024-12-12 11:12:21 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						19abf3dcaa 
					 
					
						
						
							
							Replace unicode errors instead of raising exception ( #1146 )  
						
						
						
						
					 
					
						2024-12-12 11:10:41 -08:00 
						 
				 
			
				
					
						
							
							
								madroid 
							
						 
					 
					
						
						
							
						
						06af3c9b0e 
					 
					
						
						
							
							Add finish_reason in GenerationResponse ( #1153 )  
						
						
						
						
					 
					
						2024-12-12 10:37:40 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						77b42b7c8b 
					 
					
						
						
							
							fix llava ( #1149 )  
						
						
						
						
					 
					
						2024-12-12 10:37:26 -08:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						135c5818c1 
					 
					
						
						
							
							Fix max_tokens ( #1148 )  
						
						
						
						
					 
					
						2024-12-10 11:26:04 -08:00 
						 
				 
			
				
					
						
							
							
								madroid 
							
						 
					 
					
						
						
							
						
						12083c4b7e 
					 
					
						
						
							
							Support for multiple EOS tokens ( #1141 )  
						
						... 
						
						
						
						* Support for multiple EOS tokens
* Change _eos_token_ids type from list to set
* Remove model_config & add eos_token_id
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-12-09 08:53:58 -08:00 
						 
				 
			
				
					
						
							
							
								n8programs 
							
						 
					 
					
						
						
							
						
						5687d5b99b 
					 
					
						
						
							
							Adds EXAONE architecture. ( #1145 )  
						
						... 
						
						
						
						* Adds EXAONE architecture.
* nits + format
* format
* clean up and fix rope
* clean up and fix rope
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-12-09 07:58:25 -08:00 
						 
				 
			
				
					
						
							
							
								hehua2008 
							
						 
					 
					
						
						
							
						
						893b3f085e 
					 
					
						
						
							
							Change Flux default max_shift to 1.15 to match the official one ( #1137 )  
						
						
						
						
					 
					
						2024-12-08 23:29:48 -08:00 
						 
				 
			
				
					
						
							
							
								Peter Sibley 
							
						 
					 
					
						
						
							
						
						ed91bbc4dc 
					 
					
						
						
							
							Fix final message at end of flux training ( #1143 )  
						
						
						
						
					 
					
						2024-12-08 23:01:53 -08:00 
						 
				 
			
				
					
						
							
							
								hehua2008 
							
						 
					 
					
						
						
							
						
						1fd6aae871 
					 
					
						
						
							
							Fix flux training with batch size ( #1135 )  
						
						... 
						
						
						
						Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2024-12-08 22:09:04 -08:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						2211b27388 
					 
					
						
						
							
							Mixed Quantizations ( #1132 )  
						
						... 
						
						
						
						* saving/loading mixed quantizations
* comment
* add bits per weight
* more concise bpw
* count bias too 
						
						
					 
					
						2024-12-08 14:21:50 -08:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						cd8cf28c39 
					 
					
						
						
							
							mlx_lm.evaluate (#1140 )  
						
						... 
						
						
						
						* Add evaluation script
* only write top level results
* add lm eval version
* typo
* create output dir
* relative import
* comment
---------
Co-authored-by: David Grangier <dgrangier@users.noreply.github.com > 
						
						
					 
					
						2024-12-08 12:20:10 -08:00