* fix rotating kv cache for chat use case
* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat
* nit in chat
* fix tests
* fix tests
* fix tests
* docs
* chat command
* comments + docs
* Define meta_state on all Cache implementations
* fixes + trim_prompt_cache api
* fix default model
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* feat: QDoRA with tests and a small bug fix for recalculation of self.m
* some simplifications and fixes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Adding full model weights finetuning
* Updating the LORA.md and ACKNOWLEDGMENTS.md files.
* removing --use-dora and --fulll-training and adding --fine-tune-type
* some clean up
* reformating and fixing dora training
* updated CONFIG_DEFAULTS
* update config example
* update in the config example fie
* Update LORA.md
* merge and commit
* adding argument for dora linear layer
* clean up
* clean up in the example yaml file
* fix
* final fix before sending
* small addition to re md file
* fix for loading the fully trained model by saving all the files and configs correctly
* clean up
* removing the unnesesairy files
* changing lora layers back to 16
* removed max file size
* nits
* resolve merge
* some consistency changes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* LoRA: support fine-tuning tools datasets
* LoRA: Split small function
* LoRA: add tools format to lora docs
* LoRA: pre-commit fix
* Revert "LoRA: pre-commit fix"
This reverts commit b94b7e0fe7.
* Revert "LoRA: Split small function"
This reverts commit 3f6a5f19fd.
* LoRA: remove ToolsDataset
In a JSONL file, not all data is required to include the tools value.
* nit in readme
* nit in readme
* nit in readme
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add logits_processor option for the generation as in huggingface transformers library
* concatenation correction
* Rename the tokens variable for clarity
* remove the logit_bias argument from generate_step method
* fix the variable name
* nits + test
* test
* add back logit bias + test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add 'models' endpoint to server
* Add test for new 'models' server endpoint
* Check hf_cache for mlx models
* update tests to check hf_cache for models
* simplify test
* doc
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* initial commit
* initial commit
* Adding first lines
* adding x, and dt projection layers
* adding the clamping mechanism
* First succesful inference
* last commit for today - added custom geenrate function and it works as expected, will try training and then with loading a model from the hub
* clean up
* save up
* almost
* update
* update
* fixed cache handeling
* fixed loading
* added seperate generat_step method in the model and also in the utils to automaticaly use the generate step mthod in the model class
* quick update
* still not working
* save
* still not working
* initial commit
* utils.py logits = logits[:, -1, :] TypeError: tuple indices must be integers or slices, not tuple
* update
* update
* Fixing the Batching Depfwise Comnvolution and multi token input
* fixing generate and logits outputs
* Done!
* Fixing the cache handling, generating works now trying training
* update ACKNOWLEDGEMENTS
* removing the model_type if stuff in the _step loop in generate_step and adding MambaCache in base.py for training easier generations and removing mamba in tuner/utils.
* quick clean up
* update trainer/utils for right initialisation of the layers for LoRA, but not working.
* clean up
* Forther update to trainer/utils for correct layer selection. Successfull training
* removing extra mamba-infer.py file
* clean up, reformating will come later
* reformat and big clean up, final commit
* some speedups and cleanups
* fix test
* nits
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Initial commit of --prompt-only and prompt from STDIN feature
* Switch to using --verbose instead of --prompt-only
* Fix capitalization typo
* Fix reference to changed option name
* Update exception text
* Make sure to import the correct "version" module when installing the
mlx_whisper package from local source code.
* Make sure to import the correct "version" module when installing the mlx_lm package from local source code
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* feat: Nemotron
https://huggingface.co/nvidia/Minitron-4B-Base
This is basically Llama with partial RoPE and LayerNorm instead of
BatchNorm. Also they add 1 to the LayerNorm weight for some reason.
* fixup! feat: Nemotron
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* use fast rope
* fix llama
* use fast rope for llama3.1
* requires unreleased mlx
* fix su
* fix deepseek v2
* only one of base or freqs
* nit
* fix
* hard code freqs
* feat: deepseek v1
DeepSeek is still releasing models on the DeepSeek V1 architecture.
```sh
mlx_lm.convert --hf-path deepseek-ai/DeepSeek-Prover-V1.5-RL --mlx-path DeepSeek-Prover-V1.5-RL-8bit --q-bits 8 -q
mlx_lm.generate --model DeepSeek-Prover-V1.5-RL-8bit --ignore-chat-template --max-tokens 512 --prompt 'import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
/-- The second and fourth terms of a geometric sequence are $2$ and $6$. Which of the following is a possible first term?
Show that it is $\frac{2\sqrt{3}}{3}$.-/
theorem amc12b_2003_p6 (a r : ℝ) (u : ℕ → ℝ) (h₀ : ∀ k, u k = a * r ^ k) (h₁ : u 1 = 2)
(h₂ : u 3 = 6) : u 0 = 2 / Real.sqrt 3 ∨ u 0 = -(2 / Real.sqrt 3) := by'
```
* nits
* nits
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>