Awni Hannun
69181e0058
Support non incremental kv cache growth ( #766 )
2024-05-15 12:56:24 -07:00
Awni Hannun
fad9598372
Fix llama cache check ( #763 )
...
* fix llama cache check
* add test
2024-05-08 08:35:54 -07:00
Awni Hannun
ee60e2a9d5
Kv cache ( #643 )
...
* in place kv_cache
* fix
* fix kv cache size
* partially fix kv cache dtype
* step kv cache
* multiple of step size
* more teests + kv cache
* more kv cache
* udpate all models to use kv cache
2024-05-08 08:18:13 -07:00
Anchen
f30413b63c
chore(mlx-lm): fix the number of validation batches configuration. ( #752 )
...
* chore: fix number of validation batches
* clean up
* address comment
2024-05-04 06:52:42 -07:00
Prince Canuma
abcd891851
Add support for phi-3 ( #712 )
...
* Add phi-3 modelling
* fix rope scaling warning
* add tests and update tuner utils
* update name and remove sanitize
* fix lora
2024-04-23 09:20:00 -07:00
Awni Hannun
2146bcd7ee
Quantize embedding / Update quantize API ( #680 )
...
* more async eval
* quantize embedding / update quantize api
* more updates for quantize
* update for quantize embeddings
* update sd quant API
* update sdxl quants
* error for datasets < batch_size
* async
* fix config loading
* fix quant
* fix tests
* fix req
* remove lm head if tie weights is true
* fix test
2024-04-18 18:16:10 -07:00
Anchen
f5f189e48a
fix(mlx-lm): broken server.py ( #690 )
...
* fix server.py
* fix var referenced before assignment
* add test
* clean up
2024-04-18 14:26:18 -07:00
dmdaksh
7d7e236061
- Removed unused Python imports ( #683 )
...
- bert/model.py:10: tree_unflatten
- bert/model.py:2: dataclass
- bert/model.py:8: numpy
- cifar/resnet.py:6: Any
- clip/model.py:15: tree_flatten
- clip/model.py:9: Union
- gcn/main.py:8: download_cora
- gcn/main.py:9: cross_entropy
- llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
- llms/gguf_llm/models.py:9: numpy
- llms/mixtral/mixtral.py:12: tree_map
- llms/mlx_lm/models/dbrx.py:2: Dict, Union
- llms/mlx_lm/tuner/trainer.py:5: partial
- llms/speculative_decoding/decoder.py:1: dataclass, field
- llms/speculative_decoding/decoder.py:2: Optional
- llms/speculative_decoding/decoder.py:5: mlx.nn
- llms/speculative_decoding/decoder.py:6: numpy
- llms/speculative_decoding/main.py:2: glob
- llms/speculative_decoding/main.py:3: json
- llms/speculative_decoding/main.py:5: Path
- llms/speculative_decoding/main.py:8: mlx.nn
- llms/speculative_decoding/model.py:6: tree_unflatten
- llms/speculative_decoding/model.py:7: AutoTokenizer
- llms/tests/test_lora.py:13: yaml_loader
- lora/lora.py:14: tree_unflatten
- lora/models.py:11: numpy
- lora/models.py:3: glob
- speechcommands/kwt.py:1: Any
- speechcommands/main.py:7: mlx.data
- stable_diffusion/stable_diffusion/model_io.py:4: partial
- whisper/benchmark.py:5: sys
- whisper/test.py:5: subprocess
- whisper/whisper/audio.py:6: Optional
- whisper/whisper/decoding.py:8: mlx.nn
2024-04-16 07:50:32 -07:00
Awni Hannun
c68aa3c7c3
Stable lm 2 ( #666 )
...
* stable lm 2
* test and lora
* version bump
* merge stable models
2024-04-08 14:18:55 -07:00
Prince Canuma
d661440dbb
Add support for qwen2moe ( #640 )
...
* add sparsemoe block and update decoder logic
* update file name to match HF
* update name
* Code formatting
* update gates calculation
* add support for Qwen2MoE.
* fix pytest
* code formatting and fix missing comma in utils
* Remove decoder sparse step.
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
* remove gate layer anti-quantisation
* remove unused argument
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-04-02 11:33:29 -07:00
Chime Ogbuji
f6283ef7ce
Configurable LR schedulers ( #604 )
...
* Initial config handler and test
* Added means to run from CLI
* Update lora config loading and tests
* Constrain scheduler config (warmup and minimum LR) for each kind
* Update reference to moved schedule_config module
* Minor fix
* Fix typos
* Moved build_schedule and tests
* nits in schedule config
* flake
* fix path
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-29 13:41:10 -07:00
Anchen
297a908e3d
fix(mlx-lm): type hints in gguf.py ( #621 )
2024-03-26 07:56:01 -07:00
Anchen
0ab01b4626
fix(mlx-lm): sorted probs in top_p implementation. ( #610 )
...
* fix(mlx-lm): the top p imp
* chore: address comment
2024-03-25 15:07:55 -07:00
Awni Hannun
b8a348c1b8
Switch to fast RMS/LN Norm ( #603 )
...
* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf
2024-03-23 07:13:51 -07:00
Anchen
fbed720d6f
chore(mlx-lm): fix the top_p implementation. ( #602 )
...
* chore(mlx-lm): clean up the top p imp
* chore: clean up
* chore: add test
* chore: address comments
* chore: clean up docs string
* chore: clean up test
2024-03-21 12:18:23 -07:00
Anchen
949f63f309
chore(mlx-lm): fix print_trainable_parameters for quant models ( #581 )
...
* chore(mlx-lm): fix print_trainable_parameters for quant models
* chore: clean up
* refactor: use layer type to check quant bits
* chore: address comment
2024-03-20 08:41:03 -07:00
madroid
b0bcd86a40
Support for OpenAI’s fine-tuning dataset format ( #548 )
...
* LoRA: move load_dataset to tuner/datasets.py file
* LoRA: support OpenAI chat format datasets
see https://platform.openai.com/docs/guides/fine-tuning/example-format
* LoRA: support OpenAI completion format datasets
* LoRA: formatting dataset timing to reduce memory footprint
* Refactor dataset item access in PromptCompletionDataset
* Update mlx_lm/LORA.md
* Update mlx_lm/LORA.md
* check Unsupported data format
* add tests, fine-tune doc
* add tests, fine-tune doc
* add jinja2 for chat template
* nits in readme
* nits in readme
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-19 16:45:46 -07:00
Prince Canuma
76c3244cc5
Add support for Cohere's Command-R ( #565 )
...
* initial commit for command-R
* update mlp, layernorm, lm_head and model args
* add custom layernorm
* add default to tie_word_embeddings
* add layernorm weight type and refactor
* update layernorm (bias conditional) in model/layers
* fix layer norm use traditional rope
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-13 07:03:36 -07:00
Anchen
3535408c99
chore(mlx-lm): fix tie_word_embeddings for qwen2 ( #566 )
...
* chore: fix tie_word_embeddings for qwen2
* chore: default tie_word_embeddings to True
2024-03-12 21:34:32 -07:00
Chime Ogbuji
e56d9015ef
LoRA on all linear transformer block layers ( #546 )
...
* Add --lora-all-linear option to apply LoRa to all linear transfer block layers
* Moved to YAML config and added specification of rank & alpha
* nits in conifg, more tests
* nit
* run tests for prs
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-12 07:37:40 -07:00
Awni Hannun
7cdd1b69ac
Enable unit testing in Circle and start some MLX LM tests ( #545 )
...
* add a few tests for mlx lm
* add a few tests for mlx lm
* add a few tests for mlx lm
* more tests / cleanup
2024-03-07 09:31:57 -08:00