Ivan Fioravanti 
							
						 
					 
					
						
						
							
						
						b468091f7f 
					 
					
						
						
							
							Add model management functionality for local caches ( #736 )  
						
						... 
						
						
						
						* Add model management functionality for local caches
This commit introduces a set of command-line utilities for managing MLX models downloaded and saved locally in Hugging Face cache. The functionalities include scanning existing models, retrieving detailed information about a specific model, and deleting a model by its name.
* Added mlx_lm.model to setup.py
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-05-03 12:20:13 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						92430df0a0 
					 
					
						
						
							
							Fix lora for qwen moe ( #743 )  
						
						... 
						
						
						
						* fix lora for qwen moe
* use max seq length in test as well 
						
						
					 
					
						2024-05-02 21:55:09 -07:00 
						 
				 
			
				
					
						
							
							
								madroid 
							
						 
					 
					
						
						
							
						
						5079af62db 
					 
					
						
						
							
							Update model card describe ( #654 )  
						
						... 
						
						
						
						* Update model card describe
- Add full link jump
- Add the address of the model uploader's Hugging Face homepage
* Add user_info to reduce whoami calls
* Remove the -U argument
* remove HF user info
* run pre-commit 
						
						
					 
					
						2024-05-02 21:22:04 -07:00 
						 
				 
			
				
					
						
							
							
								madroid 
							
						 
					 
					
						
						
							
						
						6775d6cb3f 
					 
					
						
						
							
							Whisper: Add pip distribution configuration to support pip installations. ( #739 )  
						
						... 
						
						
						
						* Whisper: rename whisper to mlx_whisper
* Whisper: add setup.py config for publish
* Whisper: add assets data to setup config
* Whisper: pre-commit for setup.py
* Whisper: Update README.md
* Whisper: Update README.md
* nits
* fix package data
* nit in readme
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-05-01 09:00:02 -07:00 
						 
				 
			
				
					
						
							
							
								Karim Elmaaroufi 
							
						 
					 
					
						
						
							
						
						4bf2eb17f2 
					 
					
						
						
							
							Validate server params & fix logit bias bug ( #731 )  
						
						... 
						
						
						
						* Bug fix in logit bias
* Add parameter validations
* Fix typo
* Update docstrings to match MLX styling
* Black style + fix a validation bug 
						
						
					 
					
						2024-04-30 07:27:40 -07:00 
						 
				 
			
				
					
						
							
							
								Jaward Sesay 
							
						 
					 
					
						
						
							
						
						7c0962f4e2 
					 
					
						
						
							
							Add Supported Quantized Phi-3-mini-4k-instruct gguf Weight ( #717 )  
						
						... 
						
						
						
						* support for phi-3 4bits quantized gguf weights
* Added link to 4 bits quantized model
* removed some prints
* Added correct comment
* Added correct comment
* removed print
Since last condition already prints warning for when quantization is None 
						
						
					 
					
						2024-04-29 20:11:32 -07:00 
						 
				 
			
				
					
						
							
							
								Thomas Lazarus 
							
						 
					 
					
						
						
							
						
						5513c4e57d 
					 
					
						
						
							
							Fixes Typo in Starcoder2 ( #740 )  
						
						
						
						
					 
					
						2024-04-29 13:14:45 -07:00 
						 
				 
			
				
					
						
							
							
								Javier de la Rosa 
							
						 
					 
					
						
						
							
						
						510d2bde49 
					 
					
						
						
							
							Force multi_commits when uploading to HF ( #729 )  
						
						
						
						
					 
					
						2024-04-28 19:07:17 -07:00 
						 
				 
			
				
					
						
							
							
								锦此 
							
						 
					 
					
						
						
							
						
						699de35b03 
					 
					
						
						
							
							Update lora_config.yaml ( #735 )  
						
						... 
						
						
						
						Update LoRa config YAML, replacing the adapter file argument with the adapter path argument. 
						
						
					 
					
						2024-04-28 10:24:34 -07:00 
						 
				 
			
				
					
						
							
							
								Prince Canuma 
							
						 
					 
					
						
						
							
						
						c012eb173f 
					 
					
						
						
							
							Add support for OpenELM ( #719 )  
						
						... 
						
						
						
						* add openELM
* update splitting logic
* update qkv logic and, transformer and MLP block
* code formatting and fix args
* fix array slicing and remove unused var :)
* add to tuner
* use mx.split for slicing qkv
* merge with phi3
* remove rope scaling logic
* code formatting 
						
						
					 
					
						2024-04-25 16:49:28 -07:00 
						 
				 
			
				
					
						
							
							
								Gökdeniz Gülmez 
							
						 
					 
					
						
						
							
						
						2c1c9e9024 
					 
					
						
						
							
							MiniCPM implementation ( #685 )  
						
						... 
						
						
						
						* Added support for the MiniCPM architecture
* Added support for the MiniCPM architecture
* Updated utils.py and LORA.md
* Updated utils.py and LORA.md
* Update implementation details for MiniCPM architecture
* Cleaning up
* fixed the missing lm.head layer problem
* Refactor Model class to dynamically handle tied and untied word embeddings
* Quick update
* added a dynamic rope scaling base calucaltion
* Added support for the MiniCPM architecture
* Added support for the MiniCPM architecture
* Updated utils.py and LORA.md
* Updated utils.py and LORA.md
* Update implementation details for MiniCPM architecture
* Cleaning up
* fixed the missing lm.head layer problem
* Refactor Model class to dynamically handle tied and untied word embeddings
* added a dynamic rope scaling base calucaltion
* quick fix and clean up
* clean up again
* removed the MiniCPMNorm class as its not used
* forgot something, sorry
* format
* version bump
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-04-25 15:29:28 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						685012c2ad 
					 
					
						
						
							
							Couple fixes for LoRA ( #711 )  
						
						... 
						
						
						
						* don't overwrite in test only mode
* only load model specific safetensors 
						
						
					 
					
						2024-04-25 14:16:13 -07:00 
						 
				 
			
				
					
						
							
							
								Kristian Muñiz 
							
						 
					 
					
						
						
							
						
						109ee2f2f8 
					 
					
						
						
							
							Use CORS headers for streaming for MLX Server ( #716 )  
						
						
						
						
					 
					
						2024-04-25 07:26:04 -07:00 
						 
				 
			
				
					
						
							
							
								Kevin Wang 
							
						 
					 
					
						
						
							
						
						8a265f0d54 
					 
					
						
						
							
							Fix incorrect type annotation ( #720 )  
						
						... 
						
						
						
						A `Tuple` is missing in this type annotation. 
						
						
					 
					
						2024-04-24 15:52:43 -07:00 
						 
				 
			
				
					
						
							
							
								Prince Canuma 
							
						 
					 
					
						
						
							
						
						abcd891851 
					 
					
						
						
							
							Add support for phi-3 ( #712 )  
						
						... 
						
						
						
						* Add phi-3 modelling
* fix rope scaling warning
* add tests and update tuner utils
* update name and remove sanitize
* fix lora 
						
						
					 
					
						2024-04-23 09:20:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ecbc6ff1e3 
					 
					
						
						
							
							one more quant fix ( #708 )  
						
						
						
						
					 
					
						2024-04-22 18:12:52 -07:00 
						 
				 
			
				
					
						
							
							
								Aaron Ng 
							
						 
					 
					
						
						
							
						
						8d5cf5b0c8 
					 
					
						
						
							
							use logging in mlx server ( #705 )  
						
						
						
						
					 
					
						2024-04-22 07:50:06 -07:00 
						 
				 
			
				
					
						
							
							
								AlexandrosChrtn 
							
						 
					 
					
						
						
							
						
						f20e68fcc0 
					 
					
						
						
							
							Load fused model with transformers ( #703 )  
						
						... 
						
						
						
						* save format for transformers compatibility
* save format for transformers compatibility arg
* hardcode mlx
* hardcode mlx format 
						
						
					 
					
						2024-04-21 09:04:44 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						749cabf299 
					 
					
						
						
							
							fix: unicode decoding ( #702 )  
						
						
						
						
					 
					
						2024-04-21 08:58:23 -07:00 
						 
				 
			
				
					
						
							
							
								Karim Elmaaroufi 
							
						 
					 
					
						
						
							
						
						1484598de1 
					 
					
						
						
							
							Add support for logit bias ( #697 )  
						
						
						
						
					 
					
						2024-04-21 06:53:56 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						6abdbe3be8 
					 
					
						
						
							
							Fix quant in gguf ( #698 )  
						
						... 
						
						
						
						* fix quant in gguf
* fix whisper 
						
						
					 
					
						2024-04-19 20:07:11 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						574ad7f6fe 
					 
					
						
						
							
							fix dequantization ( #693 )  
						
						
						
						
					 
					
						2024-04-19 10:46:59 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2146bcd7ee 
					 
					
						
						
							
							Quantize embedding / Update quantize API ( #680 )  
						
						... 
						
						
						
						* more async eval
* quantize embedding / update quantize api
* more updates for quantize
* update for quantize embeddings
* update sd quant API
* update sdxl quants
* error for datasets < batch_size
* async
* fix config loading
* fix quant
* fix tests
* fix req
* remove lm head if tie weights is true
* fix test 
						
						
					 
					
						2024-04-18 18:16:10 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						f5f189e48a 
					 
					
						
						
							
							fix(mlx-lm): broken server.py ( #690 )  
						
						... 
						
						
						
						* fix server.py
* fix var referenced before assignment
* add test
* clean up 
						
						
					 
					
						2024-04-18 14:26:18 -07:00 
						 
				 
			
				
					
						
							
							
								Phúc H. Lê Khắc 
							
						 
					 
					
						
						
							
						
						35206806ac 
					 
					
						
						
							
							Create executables for generate, lora, server, merge, convert ( #682 )  
						
						... 
						
						
						
						* feat: create executables mlx_lm.<cmd>
* nits in docs
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-04-16 16:08:49 -07:00 
						 
				 
			
				
					
						
							
							
								dmdaksh 
							
						 
					 
					
						
						
							
						
						7d7e236061 
					 
					
						
						
							
							- Removed unused Python imports ( #683 )  
						
						... 
						
						
						
						- bert/model.py:10: tree_unflatten
  - bert/model.py:2: dataclass
  - bert/model.py:8: numpy
  - cifar/resnet.py:6: Any
  - clip/model.py:15: tree_flatten
  - clip/model.py:9: Union
  - gcn/main.py:8: download_cora
  - gcn/main.py:9: cross_entropy
  - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
  - llms/gguf_llm/models.py:9: numpy
  - llms/mixtral/mixtral.py:12: tree_map
  - llms/mlx_lm/models/dbrx.py:2: Dict, Union
  - llms/mlx_lm/tuner/trainer.py:5: partial
  - llms/speculative_decoding/decoder.py:1: dataclass, field
  - llms/speculative_decoding/decoder.py:2: Optional
  - llms/speculative_decoding/decoder.py:5: mlx.nn
  - llms/speculative_decoding/decoder.py:6: numpy
  - llms/speculative_decoding/main.py:2: glob
  - llms/speculative_decoding/main.py:3: json
  - llms/speculative_decoding/main.py:5: Path
  - llms/speculative_decoding/main.py:8: mlx.nn
  - llms/speculative_decoding/model.py:6: tree_unflatten
  - llms/speculative_decoding/model.py:7: AutoTokenizer
  - llms/tests/test_lora.py:13: yaml_loader
  - lora/lora.py:14: tree_unflatten
  - lora/models.py:11: numpy
  - lora/models.py:3: glob
  - speechcommands/kwt.py:1: Any
  - speechcommands/main.py:7: mlx.data
  - stable_diffusion/stable_diffusion/model_io.py:4: partial
  - whisper/benchmark.py:5: sys
  - whisper/test.py:5: subprocess
  - whisper/whisper/audio.py:6: Optional
  - whisper/whisper/decoding.py:8: mlx.nn 
						
						
					 
					
						2024-04-16 07:50:32 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e55a9e8cb4 
					 
					
						
						
							
							Add an SPM detokenizer that doesn't trim initial space ( #681 )  
						
						
						
						
					 
					
						2024-04-15 14:15:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d3f8e4aee9 
					 
					
						
						
							
							Fix argpartition call in  Mixtral and other MOES ( #676 )  
						
						... 
						
						
						
						* Update mixtral.py
* fix all moes
---------
Co-authored-by: yuhai-china <yuhai.china@gmail.com > 
						
						
					 
					
						2024-04-12 11:00:56 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9c5554d8ee 
					 
					
						
						
							
							Use async eval ( #670 )  
						
						... 
						
						
						
						* Use async eval
* bump
* bump
* remove workaround for bfloat cumsum 
						
						
					 
					
						2024-04-11 13:18:23 -07:00 
						 
				 
			
				
					
						
							
							
								Nripesh Niketan 
							
						 
					 
					
						
						
							
						
						0250f6f38e 
					 
					
						
						
							
							feat: Update black-pre-commit-mirror to version 24.3.0 ( #675 )  
						
						
						
						
					 
					
						2024-04-11 07:28:26 -07:00 
						 
				 
			
				
					
						
							
							
								devonthomas35 
							
						 
					 
					
						
						
							
						
						9f472dc985 
					 
					
						
						
							
							Update transformers for ⌘-R+ ( #668 )  
						
						
						
						
					 
					
						2024-04-11 07:28:12 -07:00 
						 
				 
			
				
					
						
							
							
								da-z 
							
						 
					 
					
						
						
							
						
						5a4cad34ef 
					 
					
						
						
							
							Always resume downloads ( #674 )  
						
						... 
						
						
						
						* Always resume downloads
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-04-11 06:52:32 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						eff6690952 
					 
					
						
						
							
							Fix CFG for SDXL ( #667 )  
						
						
						
						
					 
					
						2024-04-09 06:06:41 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						1278994b56 
					 
					
						
						
							
							Add streaming detokenizers ( #651 )  
						
						
						
						
					 
					
						2024-04-08 22:36:01 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c68aa3c7c3 
					 
					
						
						
							
							Stable lm 2 ( #666 )  
						
						... 
						
						
						
						* stable lm 2
* test and lora
* version bump
* merge stable models 
						
						
					 
					
						2024-04-08 14:18:55 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1e2f7f50b6 
					 
					
						
						
							
							fix for empty initial string ( #665 )  
						
						
						
						
					 
					
						2024-04-08 10:40:05 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c386dd5f5a 
					 
					
						
						
							
							Fix for cohere plus ( #650 )  
						
						... 
						
						
						
						* fix for cohere plus
* version bump 
						
						
					 
					
						2024-04-05 14:11:24 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2bd64b78cf 
					 
					
						
						
							
							Save lora config ( #636 )  
						
						... 
						
						
						
						* lora config
* comments
* version bump 
						
						
					 
					
						2024-04-02 13:52:53 -07:00 
						 
				 
			
				
					
						
							
							
								Prince Canuma 
							
						 
					 
					
						
						
							
						
						d661440dbb 
					 
					
						
						
							
							Add support for qwen2moe ( #640 )  
						
						... 
						
						
						
						* add sparsemoe block and update decoder logic
* update file name to match HF
* update name
* Code formatting
* update gates calculation
* add support for Qwen2MoE.
* fix pytest
* code formatting and fix missing comma in utils
* Remove decoder sparse step.
Co-authored-by: bozheng-hit <dsoul0621@gmail.com >
* remove gate layer anti-quantisation
* remove unused argument
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com > 
						
						
					 
					
						2024-04-02 11:33:29 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						78c431dc25 
					 
					
						
						
							
							cleanup whisper a little ( #639 )  
						
						
						
						
					 
					
						2024-03-30 13:13:58 -07:00 
						 
				 
			
				
					
						
							
							
								Chime Ogbuji 
							
						 
					 
					
						
						
							
						
						f6283ef7ce 
					 
					
						
						
							
							Configurable LR schedulers ( #604 )  
						
						... 
						
						
						
						* Initial config handler and test
* Added means to run from CLI
* Update lora config loading and tests
* Constrain scheduler config (warmup and minimum LR) for each kind
* Update reference to moved schedule_config module
* Minor fix
* Fix typos
* Moved build_schedule and tests
* nits in schedule config
* flake
* fix path
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-29 13:41:10 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b80adbcc3e 
					 
					
						
						
							
							DBRX ( #628 )  
						
						... 
						
						
						
						* dbrx
* format
* format
* comments
* change scores slightly
* remove inadvertant import 
						
						
					 
					
						2024-03-28 21:03:53 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						297a908e3d 
					 
					
						
						
							
							fix(mlx-lm): type hints in gguf.py ( #621 )  
						
						
						
						
					 
					
						2024-03-26 07:56:01 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						0ab01b4626 
					 
					
						
						
							
							fix(mlx-lm): sorted probs in top_p implementation. ( #610 )  
						
						... 
						
						
						
						* fix(mlx-lm): the top p imp
* chore: address comment 
						
						
					 
					
						2024-03-25 15:07:55 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bbfcc103d7 
					 
					
						
						
							
							cast around lora adapters ( #613 )  
						
						
						
						
					 
					
						2024-03-24 19:34:51 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5a52899405 
					 
					
						
						
							
							Partially stream de-tokenization ( #609 )  
						
						... 
						
						
						
						* partially stream de-tokenization
* don't break full response 
						
						
					 
					
						2024-03-23 15:32:33 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						494cdf8e96 
					 
					
						
						
							
							chore: fix loar for moe model ( #608 )  
						
						
						
						
					 
					
						2024-03-23 07:22:11 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b8a348c1b8 
					 
					
						
						
							
							Switch to fast RMS/LN Norm ( #603 )  
						
						... 
						
						
						
						* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf 
						
						
					 
					
						2024-03-23 07:13:51 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						fbed720d6f 
					 
					
						
						
							
							chore(mlx-lm): fix the top_p implementation. ( #602 )  
						
						... 
						
						
						
						* chore(mlx-lm): clean up the top p imp
* chore: clean up
* chore: add test
* chore: address comments
* chore: clean up docs string
* chore: clean up test 
						
						
					 
					
						2024-03-21 12:18:23 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						fe96ef342f 
					 
					
						
						
							
							feat(mlx-lm): export the GGUF (fp16) format model weights from fuse.py ( #555 )  
						
						... 
						
						
						
						* wip
* wip
* feat: convert mlx model to gguf f16
* chore: conver norm layer to float32 to avoid overflow issue
* chore: add support for mixtral
* chore: clean up
* chore: remove unused import statement
* chore: clean up weight name mapping
* version and readme
* actual version bump
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-21 10:34:11 -07:00