Alex Cheema
cd8efc7fbc
Add support for Llama-3.1 ( #907 )
...
* add dynamicNTK scaling rope
* remove unused var
* fix rope base
* llama3.1 fixes
* TODO for rope eval
* vectorise llama3 base freq calculation
* removed the arbitrary 2.0 rope_scale default case
* fix slow llama3.1 generation by evaluating stateless part of DynamicNTKScalingRoPE in init
* nits + format
* use mx.pi
* fix tests and add test for 3.1
---------
Co-authored-by: Prince Canuma <prince.gdt@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-07-23 13:21:32 -07:00
nicolov
fbe3247772
Add GPT-neox model ( #863 )
2024-07-11 06:13:17 -07:00
Derek Lewis
89b0b75250
GPT2 Support ( #798 )
...
* GPT-2 model support
* Add test for gpt2 model
* Fix weight sanitizing for quantization
* use approx gelu
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-02 16:33:20 -07:00
Chen Xin
aac98ca6f4
support internlm2 ( #797 )
...
* support internlm2
* only attention projections
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-27 06:22:21 -07:00
Awni Hannun
69181e0058
Support non incremental kv cache growth ( #766 )
2024-05-15 12:56:24 -07:00
Awni Hannun
fad9598372
Fix llama cache check ( #763 )
...
* fix llama cache check
* add test
2024-05-08 08:35:54 -07:00
Awni Hannun
ee60e2a9d5
Kv cache ( #643 )
...
* in place kv_cache
* fix
* fix kv cache size
* partially fix kv cache dtype
* step kv cache
* multiple of step size
* more teests + kv cache
* more kv cache
* udpate all models to use kv cache
2024-05-08 08:18:13 -07:00
Prince Canuma
abcd891851
Add support for phi-3 ( #712 )
...
* Add phi-3 modelling
* fix rope scaling warning
* add tests and update tuner utils
* update name and remove sanitize
* fix lora
2024-04-23 09:20:00 -07:00
Awni Hannun
2146bcd7ee
Quantize embedding / Update quantize API ( #680 )
...
* more async eval
* quantize embedding / update quantize api
* more updates for quantize
* update for quantize embeddings
* update sd quant API
* update sdxl quants
* error for datasets < batch_size
* async
* fix config loading
* fix quant
* fix tests
* fix req
* remove lm head if tie weights is true
* fix test
2024-04-18 18:16:10 -07:00
Awni Hannun
c68aa3c7c3
Stable lm 2 ( #666 )
...
* stable lm 2
* test and lora
* version bump
* merge stable models
2024-04-08 14:18:55 -07:00
Prince Canuma
d661440dbb
Add support for qwen2moe ( #640 )
...
* add sparsemoe block and update decoder logic
* update file name to match HF
* update name
* Code formatting
* update gates calculation
* add support for Qwen2MoE.
* fix pytest
* code formatting and fix missing comma in utils
* Remove decoder sparse step.
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
* remove gate layer anti-quantisation
* remove unused argument
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-04-02 11:33:29 -07:00
Awni Hannun
b8a348c1b8
Switch to fast RMS/LN Norm ( #603 )
...
* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf
2024-03-23 07:13:51 -07:00
Prince Canuma
76c3244cc5
Add support for Cohere's Command-R ( #565 )
...
* initial commit for command-R
* update mlp, layernorm, lm_head and model args
* add custom layernorm
* add default to tie_word_embeddings
* add layernorm weight type and refactor
* update layernorm (bias conditional) in model/layers
* fix layer norm use traditional rope
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-13 07:03:36 -07:00
Anchen
3535408c99
chore(mlx-lm): fix tie_word_embeddings for qwen2 ( #566 )
...
* chore: fix tie_word_embeddings for qwen2
* chore: default tie_word_embeddings to True
2024-03-12 21:34:32 -07:00
Awni Hannun
7cdd1b69ac
Enable unit testing in Circle and start some MLX LM tests ( #545 )
...
* add a few tests for mlx lm
* add a few tests for mlx lm
* add a few tests for mlx lm
* more tests / cleanup
2024-03-07 09:31:57 -08:00