Kristian Muñiz 
							
						 
					 
					
						
						
							
						
						109ee2f2f8 
					 
					
						
						
							
							Use CORS headers for streaming for MLX Server ( #716 )  
						
						
						
						
					 
					
						2024-04-25 07:26:04 -07:00 
						 
				 
			
				
					
						
							
							
								Prince Canuma 
							
						 
					 
					
						
						
							
						
						abcd891851 
					 
					
						
						
							
							Add support for phi-3 ( #712 )  
						
						... 
						
						
						
						* Add phi-3 modelling
* fix rope scaling warning
* add tests and update tuner utils
* update name and remove sanitize
* fix lora 
						
						
					 
					
						2024-04-23 09:20:00 -07:00 
						 
				 
			
				
					
						
							
							
								Aaron Ng 
							
						 
					 
					
						
						
							
						
						8d5cf5b0c8 
					 
					
						
						
							
							use logging in mlx server ( #705 )  
						
						
						
						
					 
					
						2024-04-22 07:50:06 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						749cabf299 
					 
					
						
						
							
							fix: unicode decoding ( #702 )  
						
						
						
						
					 
					
						2024-04-21 08:58:23 -07:00 
						 
				 
			
				
					
						
							
							
								Karim Elmaaroufi 
							
						 
					 
					
						
						
							
						
						1484598de1 
					 
					
						
						
							
							Add support for logit bias ( #697 )  
						
						
						
						
					 
					
						2024-04-21 06:53:56 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						574ad7f6fe 
					 
					
						
						
							
							fix dequantization ( #693 )  
						
						
						
						
					 
					
						2024-04-19 10:46:59 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2146bcd7ee 
					 
					
						
						
							
							Quantize embedding / Update quantize API ( #680 )  
						
						... 
						
						
						
						* more async eval
* quantize embedding / update quantize api
* more updates for quantize
* update for quantize embeddings
* update sd quant API
* update sdxl quants
* error for datasets < batch_size
* async
* fix config loading
* fix quant
* fix tests
* fix req
* remove lm head if tie weights is true
* fix test 
						
						
					 
					
						2024-04-18 18:16:10 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						f5f189e48a 
					 
					
						
						
							
							fix(mlx-lm): broken server.py ( #690 )  
						
						... 
						
						
						
						* fix server.py
* fix var referenced before assignment
* add test
* clean up 
						
						
					 
					
						2024-04-18 14:26:18 -07:00 
						 
				 
			
				
					
						
							
							
								Phúc H. Lê Khắc 
							
						 
					 
					
						
						
							
						
						35206806ac 
					 
					
						
						
							
							Create executables for generate, lora, server, merge, convert ( #682 )  
						
						... 
						
						
						
						* feat: create executables mlx_lm.<cmd>
* nits in docs
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-04-16 16:08:49 -07:00 
						 
				 
			
				
					
						
							
							
								dmdaksh 
							
						 
					 
					
						
						
							
						
						7d7e236061 
					 
					
						
						
							
							- Removed unused Python imports ( #683 )  
						
						... 
						
						
						
						- bert/model.py:10: tree_unflatten
  - bert/model.py:2: dataclass
  - bert/model.py:8: numpy
  - cifar/resnet.py:6: Any
  - clip/model.py:15: tree_flatten
  - clip/model.py:9: Union
  - gcn/main.py:8: download_cora
  - gcn/main.py:9: cross_entropy
  - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
  - llms/gguf_llm/models.py:9: numpy
  - llms/mixtral/mixtral.py:12: tree_map
  - llms/mlx_lm/models/dbrx.py:2: Dict, Union
  - llms/mlx_lm/tuner/trainer.py:5: partial
  - llms/speculative_decoding/decoder.py:1: dataclass, field
  - llms/speculative_decoding/decoder.py:2: Optional
  - llms/speculative_decoding/decoder.py:5: mlx.nn
  - llms/speculative_decoding/decoder.py:6: numpy
  - llms/speculative_decoding/main.py:2: glob
  - llms/speculative_decoding/main.py:3: json
  - llms/speculative_decoding/main.py:5: Path
  - llms/speculative_decoding/main.py:8: mlx.nn
  - llms/speculative_decoding/model.py:6: tree_unflatten
  - llms/speculative_decoding/model.py:7: AutoTokenizer
  - llms/tests/test_lora.py:13: yaml_loader
  - lora/lora.py:14: tree_unflatten
  - lora/models.py:11: numpy
  - lora/models.py:3: glob
  - speechcommands/kwt.py:1: Any
  - speechcommands/main.py:7: mlx.data
  - stable_diffusion/stable_diffusion/model_io.py:4: partial
  - whisper/benchmark.py:5: sys
  - whisper/test.py:5: subprocess
  - whisper/whisper/audio.py:6: Optional
  - whisper/whisper/decoding.py:8: mlx.nn 
						
						
					 
					
						2024-04-16 07:50:32 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e55a9e8cb4 
					 
					
						
						
							
							Add an SPM detokenizer that doesn't trim initial space ( #681 )  
						
						
						
						
					 
					
						2024-04-15 14:15:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d3f8e4aee9 
					 
					
						
						
							
							Fix argpartition call in  Mixtral and other MOES ( #676 )  
						
						... 
						
						
						
						* Update mixtral.py
* fix all moes
---------
Co-authored-by: yuhai-china <yuhai.china@gmail.com > 
						
						
					 
					
						2024-04-12 11:00:56 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9c5554d8ee 
					 
					
						
						
							
							Use async eval ( #670 )  
						
						... 
						
						
						
						* Use async eval
* bump
* bump
* remove workaround for bfloat cumsum 
						
						
					 
					
						2024-04-11 13:18:23 -07:00 
						 
				 
			
				
					
						
							
							
								devonthomas35 
							
						 
					 
					
						
						
							
						
						9f472dc985 
					 
					
						
						
							
							Update transformers for ⌘-R+ ( #668 )  
						
						
						
						
					 
					
						2024-04-11 07:28:12 -07:00 
						 
				 
			
				
					
						
							
							
								da-z 
							
						 
					 
					
						
						
							
						
						5a4cad34ef 
					 
					
						
						
							
							Always resume downloads ( #674 )  
						
						... 
						
						
						
						* Always resume downloads
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-04-11 06:52:32 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						1278994b56 
					 
					
						
						
							
							Add streaming detokenizers ( #651 )  
						
						
						
						
					 
					
						2024-04-08 22:36:01 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c68aa3c7c3 
					 
					
						
						
							
							Stable lm 2 ( #666 )  
						
						... 
						
						
						
						* stable lm 2
* test and lora
* version bump
* merge stable models 
						
						
					 
					
						2024-04-08 14:18:55 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1e2f7f50b6 
					 
					
						
						
							
							fix for empty initial string ( #665 )  
						
						
						
						
					 
					
						2024-04-08 10:40:05 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c386dd5f5a 
					 
					
						
						
							
							Fix for cohere plus ( #650 )  
						
						... 
						
						
						
						* fix for cohere plus
* version bump 
						
						
					 
					
						2024-04-05 14:11:24 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2bd64b78cf 
					 
					
						
						
							
							Save lora config ( #636 )  
						
						... 
						
						
						
						* lora config
* comments
* version bump 
						
						
					 
					
						2024-04-02 13:52:53 -07:00 
						 
				 
			
				
					
						
							
							
								Prince Canuma 
							
						 
					 
					
						
						
							
						
						d661440dbb 
					 
					
						
						
							
							Add support for qwen2moe ( #640 )  
						
						... 
						
						
						
						* add sparsemoe block and update decoder logic
* update file name to match HF
* update name
* Code formatting
* update gates calculation
* add support for Qwen2MoE.
* fix pytest
* code formatting and fix missing comma in utils
* Remove decoder sparse step.
Co-authored-by: bozheng-hit <dsoul0621@gmail.com >
* remove gate layer anti-quantisation
* remove unused argument
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com > 
						
						
					 
					
						2024-04-02 11:33:29 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						78c431dc25 
					 
					
						
						
							
							cleanup whisper a little ( #639 )  
						
						
						
						
					 
					
						2024-03-30 13:13:58 -07:00 
						 
				 
			
				
					
						
							
							
								Chime Ogbuji 
							
						 
					 
					
						
						
							
						
						f6283ef7ce 
					 
					
						
						
							
							Configurable LR schedulers ( #604 )  
						
						... 
						
						
						
						* Initial config handler and test
* Added means to run from CLI
* Update lora config loading and tests
* Constrain scheduler config (warmup and minimum LR) for each kind
* Update reference to moved schedule_config module
* Minor fix
* Fix typos
* Moved build_schedule and tests
* nits in schedule config
* flake
* fix path
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-29 13:41:10 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b80adbcc3e 
					 
					
						
						
							
							DBRX ( #628 )  
						
						... 
						
						
						
						* dbrx
* format
* format
* comments
* change scores slightly
* remove inadvertant import 
						
						
					 
					
						2024-03-28 21:03:53 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						297a908e3d 
					 
					
						
						
							
							fix(mlx-lm): type hints in gguf.py ( #621 )  
						
						
						
						
					 
					
						2024-03-26 07:56:01 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						0ab01b4626 
					 
					
						
						
							
							fix(mlx-lm): sorted probs in top_p implementation. ( #610 )  
						
						... 
						
						
						
						* fix(mlx-lm): the top p imp
* chore: address comment 
						
						
					 
					
						2024-03-25 15:07:55 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bbfcc103d7 
					 
					
						
						
							
							cast around lora adapters ( #613 )  
						
						
						
						
					 
					
						2024-03-24 19:34:51 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5a52899405 
					 
					
						
						
							
							Partially stream de-tokenization ( #609 )  
						
						... 
						
						
						
						* partially stream de-tokenization
* don't break full response 
						
						
					 
					
						2024-03-23 15:32:33 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						494cdf8e96 
					 
					
						
						
							
							chore: fix loar for moe model ( #608 )  
						
						
						
						
					 
					
						2024-03-23 07:22:11 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b8a348c1b8 
					 
					
						
						
							
							Switch to fast RMS/LN Norm ( #603 )  
						
						... 
						
						
						
						* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf 
						
						
					 
					
						2024-03-23 07:13:51 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						fbed720d6f 
					 
					
						
						
							
							chore(mlx-lm): fix the top_p implementation. ( #602 )  
						
						... 
						
						
						
						* chore(mlx-lm): clean up the top p imp
* chore: clean up
* chore: add test
* chore: address comments
* chore: clean up docs string
* chore: clean up test 
						
						
					 
					
						2024-03-21 12:18:23 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						fe96ef342f 
					 
					
						
						
							
							feat(mlx-lm): export the GGUF (fp16) format model weights from fuse.py ( #555 )  
						
						... 
						
						
						
						* wip
* wip
* feat: convert mlx model to gguf f16
* chore: conver norm layer to float32 to avoid overflow issue
* chore: add support for mixtral
* chore: clean up
* chore: remove unused import statement
* chore: clean up weight name mapping
* version and readme
* actual version bump
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-21 10:34:11 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						8f906c859a 
					 
					
						
						
							
							chore(mlx-lm): enable to apply default chat template ( #577 )  
						
						... 
						
						
						
						* chore(mlx-lm): enable to apply default chat template
* Add option to use default chat template
* chore: rename the flag to use default chat template 
						
						
					 
					
						2024-03-20 21:39:39 -07:00 
						 
				 
			
				
					
						
							
							
								Ivan Fioravanti 
							
						 
					 
					
						
						
							
						
						d2a99172a6 
					 
					
						
						
							
							Add dropout parameter to lora configuration ( #599 )  
						
						... 
						
						
						
						* Add dropout parameter to lora configuration
A dropout parameter has been added to the lora configuration settings in lora_config.yaml. The LoRALinear class in utils.py has been updated to take this new parameter. Additionally, a AttributeError: 'types.SimpleNamespace' object has no attribute 'prompt' related to `args.prompt` has been removed from lora.py.
* Update lora_config.yaml
Set dropout to 0.0 in the sample config file
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-20 08:44:40 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						949f63f309 
					 
					
						
						
							
							chore(mlx-lm): fix print_trainable_parameters for quant models ( #581 )  
						
						... 
						
						
						
						* chore(mlx-lm): fix print_trainable_parameters for quant models
* chore: clean up
* refactor: use layer type to check quant bits
* chore: address comment 
						
						
					 
					
						2024-03-20 08:41:03 -07:00 
						 
				 
			
				
					
						
							
							
								Matt Wronkiewicz 
							
						 
					 
					
						
						
							
						
						373dd6f2a2 
					 
					
						
						
							
							Set finish_reason in response ( #592 )  
						
						
						
						
					 
					
						2024-03-19 20:21:26 -07:00 
						 
				 
			
				
					
						
							
							
								Alwin Arrasyid 
							
						 
					 
					
						
						
							
						
						6c3d4c8ba2 
					 
					
						
						
							
							add dequantize option to mlx_lm/convert.py ( #547 )  
						
						
						
						
					 
					
						2024-03-19 19:50:08 -07:00 
						 
				 
			
				
					
						
							
							
								Chime Ogbuji 
							
						 
					 
					
						
						
							
						
						6f2fd5daea 
					 
					
						
						
							
							Add mlx-lm version information to HF model card ( #596 )  
						
						... 
						
						
						
						* Add mlx-lm version informatiohn to HF model card
* Update llms/mlx_lm/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Reverted indentation
* Pre-commit formatting
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com > 
						
						
					 
					
						2024-03-19 19:42:03 -07:00 
						 
				 
			
				
					
						
							
							
								madroid 
							
						 
					 
					
						
						
							
						
						39d5ca6427 
					 
					
						
						
							
							LoRA: report last train info ( #595 )  
						
						
						
						
					 
					
						2024-03-19 17:29:50 -07:00 
						 
				 
			
				
					
						
							
							
								madroid 
							
						 
					 
					
						
						
							
						
						b0bcd86a40 
					 
					
						
						
							
							Support for OpenAI’s fine-tuning dataset format ( #548 )  
						
						... 
						
						
						
						* LoRA: move load_dataset to tuner/datasets.py file
* LoRA: support OpenAI chat format datasets
see https://platform.openai.com/docs/guides/fine-tuning/example-format 
* LoRA: support OpenAI completion format datasets
* LoRA: formatting dataset timing to reduce memory footprint
* Refactor dataset item access in PromptCompletionDataset
* Update mlx_lm/LORA.md
* Update mlx_lm/LORA.md
* check Unsupported data format
* add tests, fine-tune doc
* add tests, fine-tune doc
* add jinja2 for chat template
* nits in readme
* nits in readme
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-19 16:45:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e4b19bb9e1 
					 
					
						
						
							
							Make attention faster for a some models ( #574 )  
						
						... 
						
						
						
						* make attention faster for a couple models
* remove unused generation flags
* add comment on lora
* include text files as well 
						
						
					 
					
						2024-03-14 21:35:54 -07:00 
						 
				 
			
				
					
						
							
							
								sweetcard 
							
						 
					 
					
						
						
							
						
						e2205beb66 
					 
					
						
						
							
							Update server.py to add --trust-remote-code to server ( #578 )  
						
						... 
						
						
						
						* Update server.py
Add --trust-remote-code to server
* format code by running pre-commit
---------
Co-authored-by: flymonk <zhou.feng@gsafer.com > 
						
						
					 
					
						2024-03-14 07:05:19 -07:00 
						 
				 
			
				
					
						
							
							
								Sugato Ray 
							
						 
					 
					
						
						
							
						
						2cd793dd69 
					 
					
						
						
							
							feat: add update_config functionality ( #531 )  
						
						... 
						
						
						
						* feat: add `update_config` finctionality
- sorts the config for better readability
- updates "_name_or_path" key in config with upload_repo
- sets indentation of 4 spaces
- allows adding other key-value pairs via kwargs
- reduces code duplication
- standardizes config-update across mlx-lm
* feat: standardize updating config
Impactes:
- fuse.py
- merge.py
* update formatting
* remove commented out code
* update func: update_config to save_config
- drop kwards
- rename func as save_config
- incorporate review suggestions
* update func: save_config
- ensure only config-saving functionality
- function oes not return config as a dict anymore
- added review suggestions
* fixed formatting
* update formatting instruction in contribution guide
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-14 06:36:05 -07:00 
						 
				 
			
				
					
						
							
							
								madroid 
							
						 
					 
					
						
						
							
						
						485180ae91 
					 
					
						
						
							
							LoRA: some minor optimizations ( #573 )  
						
						... 
						
						
						
						* init training_args in training scope
* Add trainable parameters percentage 
						
						
					 
					
						2024-03-13 20:26:30 -07:00 
						 
				 
			
				
					
						
							
							
								madroid 
							
						 
					 
					
						
						
							
						
						d4e1de1d5b 
					 
					
						
						
							
							add peak_memory info to training callback ( #572 )  
						
						
						
						
					 
					
						2024-03-13 20:17:10 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						14fe868825 
					 
					
						
						
							
							version ( #570 )  
						
						
						
						
					 
					
						2024-03-13 10:09:36 -07:00 
						 
				 
			
				
					
						
							
							
								Prince Canuma 
							
						 
					 
					
						
						
							
						
						76c3244cc5 
					 
					
						
						
							
							Add support for Cohere's Command-R ( #565 )  
						
						... 
						
						
						
						* initial commit for command-R
* update mlp, layernorm, lm_head and model args
* add custom layernorm
* add default to tie_word_embeddings
* add layernorm weight type and refactor
* update layernorm (bias conditional) in model/layers
* fix layer norm use traditional rope
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-13 07:03:36 -07:00 
						 
				 
			
				
					
						
							
							
								Anchen 
							
						 
					 
					
						
						
							
						
						3535408c99 
					 
					
						
						
							
							chore(mlx-lm): fix tie_word_embeddings for qwen2 ( #566 )  
						
						... 
						
						
						
						* chore: fix tie_word_embeddings for qwen2
* chore: default tie_word_embeddings to True 
						
						
					 
					
						2024-03-12 21:34:32 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						39084e81c2 
					 
					
						
						
							
							Some improvements to LoRA ( #528 )  
						
						... 
						
						
						
						* set cache_limit
* remove set cache_limit
* cleanup
* add gradient checkpointing
* fix sort
* mokey patch call for checkpoint
* fix example config 
						
						
					 
					
						2024-03-12 20:02:03 -07:00 
						 
				 
			
				
					
						
							
							
								Chime Ogbuji 
							
						 
					 
					
						
						
							
						
						e56d9015ef 
					 
					
						
						
							
							LoRA on all linear transformer block layers ( #546 )  
						
						... 
						
						
						
						* Add --lora-all-linear option to apply LoRa to all linear transfer block layers
* Moved to YAML config and added specification of rank & alpha
* nits in conifg, more tests
* nit
* run tests for prs
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-03-12 07:37:40 -07:00