Commit Graph

12 Commits

Author SHA1 Message Date
Awni Hannun
ee60e2a9d5
Kv cache (#643)
* in place kv_cache

* fix

* fix kv cache size

* partially fix kv cache dtype

* step kv cache

* multiple of step size

* more teests + kv cache

* more kv cache

* udpate all models to use kv cache
2024-05-08 08:18:13 -07:00
Awni Hannun
2146bcd7ee
Quantize embedding / Update quantize API (#680)
* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test
2024-04-18 18:16:10 -07:00
Awni Hannun
b8a348c1b8
Switch to fast RMS/LN Norm (#603)
* use nn.RMSNorm, use sdpa, cleanup

* bump mlx versions

* minor update

* use fast layer norm

* version bump

* update requirement for whisper

* update requirement for gguf
2024-03-23 07:13:51 -07:00
Anchen
3535408c99
chore(mlx-lm): fix tie_word_embeddings for qwen2 (#566)
* chore: fix tie_word_embeddings for qwen2

* chore: default tie_word_embeddings to True
2024-03-12 21:34:32 -07:00
Awni Hannun
8b05bb6d18
[mlx-lm] Use sdpa in llama / mistral model (#515)
* use sdpa

* update a few more models

* version

* fix stablelm type
2024-03-07 17:41:23 -08:00
Anchen
8a178f8716
chore: enable tie_word_embeddings config for qwen2 (#544) 2024-03-07 06:11:35 -08:00
Awni Hannun
f24edfa9dc
[mlx-lm] Add precompiled normalizations (#451)
* add precompiled normalizations

* nits
2024-02-22 12:40:55 -08:00
Awni Hannun
8fd953ee2b
Support for slerp merging models (#455)
* support for slerp merging models

* docs

* update docs

* format'
2024-02-19 20:37:15 -08:00
Angelos Katharopoulos
f71e965d57
Change gqa to use repeat instead of concatenate (#443) 2024-02-14 17:40:11 -08:00
Awni Hannun
06ddb8414d
Fix Qwen2 and SD (#441)
* fix qwen2

* version bump

* fix list shape
2024-02-14 13:43:12 -08:00
Awni Hannun
d4666615bb
Lazy import + refactor Lora layer addition (#426)
* lazy model import in mlx_lm

* change lora loading

* fix olmo lora

* remove a bunch of unused stuff from plamo

* move phixtral to mlx-lm and out of llms/
2024-02-12 10:51:02 -08:00
Junyang Lin
9d0dd34403
add qwen2 (#411) 2024-02-04 08:31:38 -08:00