Commit Graph

35 Commits

Author SHA1 Message Date
Awni Hannun
7be292c0c9 Handle longer prompt/generation (#931)
* rebase

* nits

* nit

* fix rotating cache with step prefill

* update version
2024-08-16 15:28:39 -07:00
Zai Thottakath
4e01700816 Allow the entire model to be targed for LoRA and DoRA fine tuning: LoRA and DoRA embeddings with small DoRALinear bug fix (#914)
* feature: LoRA adapter for Embeddings

* feature: wire in LoRAEmbedding into the tuner. Allow the embedding and non model.layers Linear layers to be targeted for fine tuning

* feature: DoRA adapter for Embeddings

* feature: wire in DoRAEmbedding

* bugfix: ensure self.m is recalculated when the linear layer is changed in DoRALinear.from_linear

* refactor: prefer from_base over from_linear or from_embedding. prefer fuse over to_linear or to_embedding

* cleanup: remove unused imports in test_dora.py

* refactor: remove unnecessary non_layer_modules

* cleanup: remove wrong comments for lora embedding dropout. remove uncessary parens in dora embedding dropout

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-16 07:38:36 -07:00
Awni Hannun
9b83004631 Faster sampling with mx.compile (#937)
* faster sampling with compile

* fix test
2024-08-15 11:29:09 -07:00
Awni Hannun
95840f32e2 Fix whipser conversion for safetensors models (#935)
* fix whipser conversion for safetensor only. error in mlx lm for existing paths

* fix tests
2024-08-14 10:22:04 -07:00
tidely
df744c98e6 Predict stop sequence matches during streaming (#541)
* Predict stop sequence matches during streaming

Check for overlap of stop sequences and the tokens array for potential sequence matches after more tokens get generated. Generate tokens until we can confirm that the stop sequence is not met.

* fix typo

* Change sequence_overlap logic

* range isn't inclusive, add 1 to max_overlap

* Add test_server.py

Added a test for the sequence_overlap method

* nits

* eos sequence

* finalize

---------

Co-authored-by: Y4hL <43219534+Y4hL@users.noreply.github.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-06 15:24:15 -07:00
Khush Gupta
8fa12b0058 Adapters loading (#902)
* Added functionality to load in adapters through post-requests so you do not need to restart the server

* ran pre-commit

* nits

* fix test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-01 16:18:18 -07:00
Anchen
7a3ab1620a support load model by custom get_model_classes (#899)
* feature(mlx_lm): support load model by custom get classes

* rename the param
2024-07-25 11:01:17 -07:00
Alex Cheema
cd8efc7fbc Add support for Llama-3.1 (#907)
* add dynamicNTK scaling rope

* remove unused var

* fix rope base

* llama3.1 fixes

* TODO for rope eval

* vectorise llama3 base freq calculation

* removed the arbitrary 2.0 rope_scale default case

* fix slow llama3.1 generation by evaluating stateless part of DynamicNTKScalingRoPE in init

* nits + format

* use mx.pi

* fix tests and add test for 3.1

---------

Co-authored-by: Prince Canuma <prince.gdt@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-07-23 13:21:32 -07:00
nicolov
fbe3247772 Add GPT-neox model (#863) 2024-07-11 06:13:17 -07:00
Angelos Katharopoulos
f212b770d8 Server loads the model on demand from the request (#851) 2024-06-27 11:37:57 -07:00
Chime Ogbuji
df6bc09d74 Configuration-based use of HF hub-hosted datasets for training (#701)
* Add hf_dataset configuration for using HF hub-hosted datasets for (Q)LoRA training

* Pre-commit formatting

* Fix YAML config example

* Print DS info

* Include name

* Add hf_dataset parameter default

* Remove TextHFDataset and CompletionsHFDataset and use Dataset and CompletionsDataset instead, adding a text_key constructor argument to the former (and changing it to work with a provided data structure instead of just from a JSON file), and prompt_key and completion_key arguments to the latter with defaults for backwards compatibility.

* nits

* update docs

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-26 10:20:50 -07:00
Derek Lewis
89b0b75250 GPT2 Support (#798)
* GPT-2 model support

* Add test for gpt2 model

* Fix weight sanitizing for quantization

* use approx gelu

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-02 16:33:20 -07:00
madroid
c457a3f88b LoRA: Extract small function (#614)
* LoRA: Extract pre_processing_model  function

* LoRA: Extract small functions(train_model,evaluate_model)

* move test case to test_tuner_utils.py

* nits

* nits

* remove extra param, validate at it 0

* version

* fix test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-02 06:38:42 -07:00
Chen Xin
aac98ca6f4 support internlm2 (#797)
* support internlm2

* only attention projections

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-27 06:22:21 -07:00
Awni Hannun
69181e0058 Support non incremental kv cache growth (#766) 2024-05-15 12:56:24 -07:00
Awni Hannun
fad9598372 Fix llama cache check (#763)
* fix llama cache check

* add test
2024-05-08 08:35:54 -07:00
Awni Hannun
ee60e2a9d5 Kv cache (#643)
* in place kv_cache

* fix

* fix kv cache size

* partially fix kv cache dtype

* step kv cache

* multiple of step size

* more teests + kv cache

* more kv cache

* udpate all models to use kv cache
2024-05-08 08:18:13 -07:00
Anchen
f30413b63c chore(mlx-lm): fix the number of validation batches configuration. (#752)
* chore: fix number of validation batches

* clean up

* address comment
2024-05-04 06:52:42 -07:00
Prince Canuma
abcd891851 Add support for phi-3 (#712)
* Add phi-3 modelling

* fix rope scaling warning

* add tests and update tuner utils

* update name and remove sanitize

* fix lora
2024-04-23 09:20:00 -07:00
Awni Hannun
2146bcd7ee Quantize embedding / Update quantize API (#680)
* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test
2024-04-18 18:16:10 -07:00
Anchen
f5f189e48a fix(mlx-lm): broken server.py (#690)
* fix server.py

* fix var referenced before assignment

* add test

* clean up
2024-04-18 14:26:18 -07:00
dmdaksh
7d7e236061 - Removed unused Python imports (#683)
- bert/model.py:10: tree_unflatten
  - bert/model.py:2: dataclass
  - bert/model.py:8: numpy
  - cifar/resnet.py:6: Any
  - clip/model.py:15: tree_flatten
  - clip/model.py:9: Union
  - gcn/main.py:8: download_cora
  - gcn/main.py:9: cross_entropy
  - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
  - llms/gguf_llm/models.py:9: numpy
  - llms/mixtral/mixtral.py:12: tree_map
  - llms/mlx_lm/models/dbrx.py:2: Dict, Union
  - llms/mlx_lm/tuner/trainer.py:5: partial
  - llms/speculative_decoding/decoder.py:1: dataclass, field
  - llms/speculative_decoding/decoder.py:2: Optional
  - llms/speculative_decoding/decoder.py:5: mlx.nn
  - llms/speculative_decoding/decoder.py:6: numpy
  - llms/speculative_decoding/main.py:2: glob
  - llms/speculative_decoding/main.py:3: json
  - llms/speculative_decoding/main.py:5: Path
  - llms/speculative_decoding/main.py:8: mlx.nn
  - llms/speculative_decoding/model.py:6: tree_unflatten
  - llms/speculative_decoding/model.py:7: AutoTokenizer
  - llms/tests/test_lora.py:13: yaml_loader
  - lora/lora.py:14: tree_unflatten
  - lora/models.py:11: numpy
  - lora/models.py:3: glob
  - speechcommands/kwt.py:1: Any
  - speechcommands/main.py:7: mlx.data
  - stable_diffusion/stable_diffusion/model_io.py:4: partial
  - whisper/benchmark.py:5: sys
  - whisper/test.py:5: subprocess
  - whisper/whisper/audio.py:6: Optional
  - whisper/whisper/decoding.py:8: mlx.nn
2024-04-16 07:50:32 -07:00
Awni Hannun
c68aa3c7c3 Stable lm 2 (#666)
* stable lm 2

* test and lora

* version bump

* merge stable models
2024-04-08 14:18:55 -07:00
Prince Canuma
d661440dbb Add support for qwen2moe (#640)
* add sparsemoe block and update decoder logic

* update file name to match HF

* update name

* Code formatting

* update gates calculation

* add support for Qwen2MoE.

* fix pytest

* code formatting and fix missing comma in utils

* Remove decoder sparse step.

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>

* remove gate layer anti-quantisation

* remove unused argument

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-04-02 11:33:29 -07:00
Chime Ogbuji
f6283ef7ce Configurable LR schedulers (#604)
* Initial config handler and test

* Added means to run from CLI

* Update lora config loading and tests

* Constrain scheduler config (warmup and minimum LR) for each kind

* Update reference to moved schedule_config module

* Minor fix

* Fix typos

* Moved build_schedule and tests

* nits in schedule config

* flake

* fix path

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-29 13:41:10 -07:00
Anchen
297a908e3d fix(mlx-lm): type hints in gguf.py (#621) 2024-03-26 07:56:01 -07:00
Anchen
0ab01b4626 fix(mlx-lm): sorted probs in top_p implementation. (#610)
* fix(mlx-lm): the top p imp

* chore: address comment
2024-03-25 15:07:55 -07:00
Awni Hannun
b8a348c1b8 Switch to fast RMS/LN Norm (#603)
* use nn.RMSNorm, use sdpa, cleanup

* bump mlx versions

* minor update

* use fast layer norm

* version bump

* update requirement for whisper

* update requirement for gguf
2024-03-23 07:13:51 -07:00
Anchen
fbed720d6f chore(mlx-lm): fix the top_p implementation. (#602)
* chore(mlx-lm): clean up the top p imp

* chore: clean up

* chore: add test

* chore: address comments

* chore: clean up docs string

* chore: clean up test
2024-03-21 12:18:23 -07:00
Anchen
949f63f309 chore(mlx-lm): fix print_trainable_parameters for quant models (#581)
* chore(mlx-lm): fix print_trainable_parameters for quant models

* chore: clean up

* refactor: use layer type to check quant bits

* chore: address comment
2024-03-20 08:41:03 -07:00
madroid
b0bcd86a40 Support for OpenAI’s fine-tuning dataset format (#548)
* LoRA: move load_dataset to tuner/datasets.py file

* LoRA: support OpenAI chat format datasets

see https://platform.openai.com/docs/guides/fine-tuning/example-format

* LoRA: support OpenAI completion format datasets

* LoRA: formatting dataset timing to reduce memory footprint

* Refactor dataset item access in PromptCompletionDataset

* Update mlx_lm/LORA.md

* Update mlx_lm/LORA.md

* check Unsupported data format

* add tests, fine-tune doc

* add tests, fine-tune doc

* add jinja2 for chat template

* nits in readme

* nits in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-19 16:45:46 -07:00
Prince Canuma
76c3244cc5 Add support for Cohere's Command-R (#565)
* initial commit for command-R

* update mlp, layernorm, lm_head and model args

* add custom layernorm

* add default to tie_word_embeddings

* add layernorm weight type and refactor

* update layernorm (bias conditional) in model/layers

* fix layer norm use traditional rope

* add test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-13 07:03:36 -07:00
Anchen
3535408c99 chore(mlx-lm): fix tie_word_embeddings for qwen2 (#566)
* chore: fix tie_word_embeddings for qwen2

* chore: default tie_word_embeddings to True
2024-03-12 21:34:32 -07:00
Chime Ogbuji
e56d9015ef LoRA on all linear transformer block layers (#546)
* Add --lora-all-linear option to apply LoRa to all linear transfer block layers

* Moved to YAML config and added specification of rank & alpha

* nits in conifg, more tests

* nit

* run tests for prs

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-12 07:37:40 -07:00
Awni Hannun
7cdd1b69ac Enable unit testing in Circle and start some MLX LM tests (#545)
* add a few tests for mlx lm

* add a few tests for mlx lm

* add a few tests for mlx lm

* more tests / cleanup
2024-03-07 09:31:57 -08:00