Commit Graph

707 Commits

Author SHA1 Message Date
Zai Thottakath
418d9a5511
Feature: QDoRA (#891)
* feat: QDoRA with tests and a small bug fix for recalculation of self.m

* some simplifications and fixes

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:01:11 -07:00
madroid
aa1c8abdc6
LoRA: Support HuggingFace dataset via data parameter (#996)
* LoRA: support huggingface dataset via `data` argument

* LoRA: Extract the load_custom_hf_dataset function

* LoRA: split small functions

* fix spelling errors

* handle load hf dataset error

* fix pre-commit lint

* update data argument help

* nits and doc

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 07:36:21 -07:00
Gökdeniz Gülmez
50e5ca81a8
Adding full finetuning (#903)
* Adding full model weights finetuning

* Updating the LORA.md and ACKNOWLEDGMENTS.md files.

* removing --use-dora and --fulll-training and adding --fine-tune-type

* some clean up

* reformating and fixing dora training

* updated CONFIG_DEFAULTS

* update config example

* update in the config example fie

* Update LORA.md

* merge and commit

* adding argument for dora linear layer

* clean up

* clean up in the example yaml file

* fix

* final fix before sending

* small addition to re md file

* fix for loading the fully trained model by saving all the files and configs correctly

* clean up

* removing the unnesesairy files

* changing lora layers back to 16

* removed max file size

* nits

* resolve merge

* some consistency changes

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-29 17:12:47 -07:00
madroid
7ec2021bb9
LoRA: support tools(function calling) format datasets (#995)
* LoRA: support fine-tuning tools datasets

* LoRA: Split small function

* LoRA: add tools format to lora docs

* LoRA: pre-commit fix

* Revert "LoRA: pre-commit fix"

This reverts commit b94b7e0fe7.

* Revert "LoRA: Split small function"

This reverts commit 3f6a5f19fd.

* LoRA: remove ToolsDataset

In a JSONL file, not all data is required to include the tools value.

* nit in readme

* nit in readme

* nit in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:41:36 -07:00
nathan
ace2bb5890
Add logits_processor option to generate_step function (#983)
* Add logits_processor option for the generation as in huggingface transformers library

* concatenation correction

* Rename the tokens variable for clarity

* remove the logit_bias argument from generate_step method

* fix the variable name

* nits + test

* test

* add back logit bias + test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:08:49 -07:00
jamesm131
d812516d3d
Add /v1/models endpoint to mlx_lm.server (#984)
* Add 'models' endpoint to server

* Add test for new 'models' server endpoint

* Check hf_cache for mlx models

* update tests to check hf_cache for models

* simplify test

* doc

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 07:21:11 -07:00
Gökdeniz Gülmez
76710f61af
Adding support for mamba (#940)
* initial commit

* initial commit

* Adding first lines

* adding x, and dt projection layers

* adding the clamping mechanism

* First succesful inference

* last commit for today - added custom geenrate function and it works as expected, will try training and then with loading a model from the hub

* clean up

* save up

* almost

* update

* update

* fixed cache handeling

* fixed loading

* added seperate generat_step method in the model and also in the utils to automaticaly use the generate step mthod in the model class

* quick update

* still not working

* save

* still not working

* initial commit

* utils.py logits = logits[:, -1, :] TypeError: tuple indices must be integers or slices, not tuple

* update

* update

* Fixing the Batching Depfwise Comnvolution and multi token input

* fixing generate and logits outputs

* Done!

* Fixing the cache handling, generating works now trying training

* update ACKNOWLEDGEMENTS

* removing the model_type if stuff in the _step loop in generate_step and adding MambaCache in base.py for training easier generations and removing mamba in tuner/utils.

* quick clean up

* update trainer/utils for right initialisation of the layers for LoRA, but not working.

* clean up

* Forther update to trainer/utils for correct layer selection. Successfull training

* removing extra mamba-infer.py file

* clean up, reformating will come later

* reformat and big clean up, final commit

* some speedups and cleanups

* fix test

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 07:02:53 -07:00
Cheng
e776c970f7
Fix llava model when using text-only prompt (#998) 2024-09-25 07:19:41 -07:00
Awni Hannun
9bb2dd62f3
Encodec (#991)
* initial encodec

* works

* nits

* use fast group norm

* fix for rnn layer

* fix mlx version

* use custom LSTM kernel

* audio encodec

* fix example, support batched inference

* nits
2024-09-23 11:39:25 -07:00
Angelos Katharopoulos
796d5e40e4
Fix export to gguf (#993) 2024-09-20 13:33:45 -07:00
Awni Hannun
f530f56df2
don't use internal exception (#990) 2024-09-17 16:22:48 -07:00
Awni Hannun
6c2369e4b9
Fix bug in upload + docs nit (#981)
* fix bug in upload + docs nit

* nit
2024-09-07 14:46:57 -07:00
Awni Hannun
c3e3411756
Update LLM generation docs to use chat template (#973)
* fix docs

* add template to model cards as well

* revert

* version
2024-09-07 06:06:15 -07:00
Angelos Katharopoulos
324184d670
Fix the cache_prompt (#979) 2024-09-06 20:19:27 -07:00
madroid
bd29aec299
Support HuggingFace model tree (#957)
* Hub: Update quantization configuration fields

* Hub: add base_model metadata

* Hub: add quantization_config for model tree Quantized type

* Hub: update quantization_config value

* Hub: remove config print
2024-09-04 06:19:32 -07:00
Chime Ogbuji
83a209e200
Add prompt piping (#962)
* Initial commit of --prompt-only and prompt from STDIN feature

* Switch to using --verbose instead of --prompt-only

* Fix capitalization typo

* Fix reference to changed option name

* Update exception text
2024-09-03 13:29:10 -07:00
James Zhao
bf921afcbe
Make sure to import the correct "version" module when installing mlx_whisper and mlx_lm from local source code. (#969)
* Make sure to import the correct "version" module when installing the
mlx_whisper package from local source code.

* Make sure to import the correct "version" module when installing the mlx_lm package from local source code

* fix

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-03 13:16:21 -07:00
Awni Hannun
3c6e8b11af
fix (#965) 2024-08-30 05:56:27 -07:00
L
fc93c55723
feat(mlx_lm): Nemotron (#949)
* feat: Nemotron

https://huggingface.co/nvidia/Minitron-4B-Base

This is basically Llama with partial RoPE and LayerNorm instead of
BatchNorm. Also they add 1 to the LayerNorm weight for some reason.

* fixup! feat: Nemotron

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-29 21:08:57 -07:00
Awni Hannun
b1186e2a81
Docs on prompt scaling (#963)
* docs on prompt scaling

* remove unused var

* nits
2024-08-29 15:05:17 -07:00
Angelos Katharopoulos
1003a8b2dd
Add the ability to load the KV cache from a file (#956) 2024-08-28 22:11:45 -07:00
Angelos Katharopoulos
7f8c961287
Fix setattr for the TokenizerWrapper (#961) 2024-08-28 14:47:33 -07:00
Nripesh Niketan
bf21789b17
chore: update black pre-commit hooks to latest versions (#955) 2024-08-26 07:54:23 -07:00
Prince Canuma
b5e18ef1e3
Add Phi-3.5-MoE (#946)
* add phimoe

* add phimoe to tunner

* add switch_mlp

* fix SuScaled args

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-24 06:52:33 -07:00
Awni Hannun
6731254e76
Use fast rope (#945)
* use fast rope

* fix llama

* use fast rope for llama3.1

* requires unreleased mlx

* fix su

* fix deepseek v2

* only one of base or freqs

* nit

* fix

* hard code freqs
2024-08-23 13:18:51 -07:00
Awni Hannun
58591a1b41
fine tune deepseek (#932) 2024-08-22 10:41:21 -07:00
L
0164d2058b
feat: DeepSeek MoE v1 (#942)
* feat: deepseek v1

DeepSeek is still releasing models on the DeepSeek V1 architecture.

```sh
mlx_lm.convert --hf-path deepseek-ai/DeepSeek-Prover-V1.5-RL --mlx-path DeepSeek-Prover-V1.5-RL-8bit --q-bits 8 -q
mlx_lm.generate --model DeepSeek-Prover-V1.5-RL-8bit --ignore-chat-template --max-tokens 512 --prompt 'import Mathlib
import Aesop

set_option maxHeartbeats 0

open BigOperators Real Nat Topology Rat

/-- The second and fourth terms of a geometric sequence are $2$ and $6$. Which of the following is a possible first term?
Show that it is $\frac{2\sqrt{3}}{3}$.-/
theorem amc12b_2003_p6 (a r : ℝ) (u : ℕ → ℝ) (h₀ : ∀ k, u k = a * r ^ k) (h₁ : u 1 = 2)
  (h₂ : u 3 = 6) : u 0 = 2 / Real.sqrt 3 ∨ u 0 = -(2 / Real.sqrt 3) := by'
```

* nits

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-17 07:18:09 -07:00
Awni Hannun
7be292c0c9
Handle longer prompt/generation (#931)
* rebase

* nits

* nit

* fix rotating cache with step prefill

* update version
2024-08-16 15:28:39 -07:00
madroid
e196fa3208
Whisper: Support command line (#746)
* Whisper: Add CLI command

* Whisper: Prevent precision loss when converting to words dictionary

* Whisper: disable json ensure_ascii

* Whisper: add cli setup config

* Whisper: pre-commit

* Whisper: Adjust the _ in the command line arguments to -

* nits

* version + readme

* nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-16 10:35:44 -07:00
Zai Thottakath
4e01700816
Allow the entire model to be targed for LoRA and DoRA fine tuning: LoRA and DoRA embeddings with small DoRALinear bug fix (#914)
* feature: LoRA adapter for Embeddings

* feature: wire in LoRAEmbedding into the tuner. Allow the embedding and non model.layers Linear layers to be targeted for fine tuning

* feature: DoRA adapter for Embeddings

* feature: wire in DoRAEmbedding

* bugfix: ensure self.m is recalculated when the linear layer is changed in DoRALinear.from_linear

* refactor: prefer from_base over from_linear or from_embedding. prefer fuse over to_linear or to_embedding

* cleanup: remove unused imports in test_dora.py

* refactor: remove unnecessary non_layer_modules

* cleanup: remove wrong comments for lora embedding dropout. remove uncessary parens in dora embedding dropout

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-16 07:38:36 -07:00
Chime Ogbuji
c50971e860
Min P implementation (#926)
* Min P implementation

* Change default to 0 (no min_p)

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-15 15:45:02 -07:00
Awni Hannun
9b83004631
Faster sampling with mx.compile (#937)
* faster sampling with compile

* fix test
2024-08-15 11:29:09 -07:00
Awni Hannun
95840f32e2
Fix whipser conversion for safetensors models (#935)
* fix whipser conversion for safetensor only. error in mlx lm for existing paths

* fix tests
2024-08-14 10:22:04 -07:00
Awni Hannun
33905447f9
Whisper updates to allow HF models (#923)
* simplify conversion and update convert for HF models

* use npz for compat

* fixes

* fixes

* fix gguf

* allow user supplied path
2024-08-09 11:11:58 -07:00
tidely
df744c98e6
Predict stop sequence matches during streaming (#541)
* Predict stop sequence matches during streaming

Check for overlap of stop sequences and the tokens array for potential sequence matches after more tokens get generated. Generate tokens until we can confirm that the stop sequence is not met.

* fix typo

* Change sequence_overlap logic

* range isn't inclusive, add 1 to max_overlap

* Add test_server.py

Added a test for the sequence_overlap method

* nits

* eos sequence

* finalize

---------

Co-authored-by: Y4hL <43219534+Y4hL@users.noreply.github.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-06 15:24:15 -07:00
Khush Gupta
8fa12b0058
Adapters loading (#902)
* Added functionality to load in adapters through post-requests so you do not need to restart the server

* ran pre-commit

* nits

* fix test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-01 16:18:18 -07:00
Shubbair
a5752be9d9 Code Arrangement 2024-08-01 15:41:21 +03:00
Shubbair
f84b231cf2 Code Arrangement 2024-08-01 15:29:43 +03:00
Shubbair
7e0bdacef3 Code Arrangement 2024-08-01 15:22:19 +03:00
Shubbair
37bbf3ec54 Updating GAN Code... 2024-08-01 01:04:14 +03:00
Shubbair
4d17f80efb Updating GAN Code... 2024-07-31 20:23:57 +03:00
Shubbair
1ef3ad2c6c Updating GAN Code... 2024-07-31 19:59:36 +03:00
Shubbair
a8ffa9cb18 Updating GAN Code... 2024-07-31 11:50:32 +03:00
Shubbair
f70cef9567 Updating GAN Code... 2024-07-31 11:25:39 +03:00
Shubbair
6f7a6609b9 Updating MLX Notebook 2024-07-30 20:01:14 +03:00
Shubbair
0644cc101b Updating MLX Notebook 2024-07-30 19:50:02 +03:00
Shubbair
ad2b6643c0 Updating GAN Code... 2024-07-30 16:59:35 +03:00
Shubbair
3bea855bd2 Updating GAN Code... 2024-07-30 13:45:09 +03:00
Shubbair
c2d731d8a3 Updating GAN Code... 2024-07-30 13:24:53 +03:00
Shubbair
ba52447385 Updating GAN Code... 2024-07-30 13:21:38 +03:00