Commit Graph

61 Commits

Author SHA1 Message Date
Gökdeniz Gülmez
50e5ca81a8 Adding full finetuning (#903)
* Adding full model weights finetuning

* Updating the LORA.md and ACKNOWLEDGMENTS.md files.

* removing --use-dora and --fulll-training and adding --fine-tune-type

* some clean up

* reformating and fixing dora training

* updated CONFIG_DEFAULTS

* update config example

* update in the config example fie

* Update LORA.md

* merge and commit

* adding argument for dora linear layer

* clean up

* clean up in the example yaml file

* fix

* final fix before sending

* small addition to re md file

* fix for loading the fully trained model by saving all the files and configs correctly

* clean up

* removing the unnesesairy files

* changing lora layers back to 16

* removed max file size

* nits

* resolve merge

* some consistency changes

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-29 17:12:47 -07:00
madroid
7ec2021bb9 LoRA: support tools(function calling) format datasets (#995)
* LoRA: support fine-tuning tools datasets

* LoRA: Split small function

* LoRA: add tools format to lora docs

* LoRA: pre-commit fix

* Revert "LoRA: pre-commit fix"

This reverts commit b94b7e0fe7.

* Revert "LoRA: Split small function"

This reverts commit 3f6a5f19fd.

* LoRA: remove ToolsDataset

In a JSONL file, not all data is required to include the tools value.

* nit in readme

* nit in readme

* nit in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:41:36 -07:00
Gökdeniz Gülmez
76710f61af Adding support for mamba (#940)
* initial commit

* initial commit

* Adding first lines

* adding x, and dt projection layers

* adding the clamping mechanism

* First succesful inference

* last commit for today - added custom geenrate function and it works as expected, will try training and then with loading a model from the hub

* clean up

* save up

* almost

* update

* update

* fixed cache handeling

* fixed loading

* added seperate generat_step method in the model and also in the utils to automaticaly use the generate step mthod in the model class

* quick update

* still not working

* save

* still not working

* initial commit

* utils.py logits = logits[:, -1, :] TypeError: tuple indices must be integers or slices, not tuple

* update

* update

* Fixing the Batching Depfwise Comnvolution and multi token input

* fixing generate and logits outputs

* Done!

* Fixing the cache handling, generating works now trying training

* update ACKNOWLEDGEMENTS

* removing the model_type if stuff in the _step loop in generate_step and adding MambaCache in base.py for training easier generations and removing mamba in tuner/utils.

* quick clean up

* update trainer/utils for right initialisation of the layers for LoRA, but not working.

* clean up

* Forther update to trainer/utils for correct layer selection. Successfull training

* removing extra mamba-infer.py file

* clean up, reformating will come later

* reformat and big clean up, final commit

* some speedups and cleanups

* fix test

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 07:02:53 -07:00
L
fc93c55723 feat(mlx_lm): Nemotron (#949)
* feat: Nemotron

https://huggingface.co/nvidia/Minitron-4B-Base

This is basically Llama with partial RoPE and LayerNorm instead of
BatchNorm. Also they add 1 to the LayerNorm weight for some reason.

* fixup! feat: Nemotron

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-29 21:08:57 -07:00
Prince Canuma
b5e18ef1e3 Add Phi-3.5-MoE (#946)
* add phimoe

* add phimoe to tunner

* add switch_mlp

* fix SuScaled args

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-24 06:52:33 -07:00
Awni Hannun
58591a1b41 fine tune deepseek (#932) 2024-08-22 10:41:21 -07:00
L
0164d2058b feat: DeepSeek MoE v1 (#942)
* feat: deepseek v1

DeepSeek is still releasing models on the DeepSeek V1 architecture.

```sh
mlx_lm.convert --hf-path deepseek-ai/DeepSeek-Prover-V1.5-RL --mlx-path DeepSeek-Prover-V1.5-RL-8bit --q-bits 8 -q
mlx_lm.generate --model DeepSeek-Prover-V1.5-RL-8bit --ignore-chat-template --max-tokens 512 --prompt 'import Mathlib
import Aesop

set_option maxHeartbeats 0

open BigOperators Real Nat Topology Rat

/-- The second and fourth terms of a geometric sequence are $2$ and $6$. Which of the following is a possible first term?
Show that it is $\frac{2\sqrt{3}}{3}$.-/
theorem amc12b_2003_p6 (a r : ℝ) (u : ℕ → ℝ) (h₀ : ∀ k, u k = a * r ^ k) (h₁ : u 1 = 2)
  (h₂ : u 3 = 6) : u 0 = 2 / Real.sqrt 3 ∨ u 0 = -(2 / Real.sqrt 3) := by'
```

* nits

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-17 07:18:09 -07:00
Zai Thottakath
4e01700816 Allow the entire model to be targed for LoRA and DoRA fine tuning: LoRA and DoRA embeddings with small DoRALinear bug fix (#914)
* feature: LoRA adapter for Embeddings

* feature: wire in LoRAEmbedding into the tuner. Allow the embedding and non model.layers Linear layers to be targeted for fine tuning

* feature: DoRA adapter for Embeddings

* feature: wire in DoRAEmbedding

* bugfix: ensure self.m is recalculated when the linear layer is changed in DoRALinear.from_linear

* refactor: prefer from_base over from_linear or from_embedding. prefer fuse over to_linear or to_embedding

* cleanup: remove unused imports in test_dora.py

* refactor: remove unnecessary non_layer_modules

* cleanup: remove wrong comments for lora embedding dropout. remove uncessary parens in dora embedding dropout

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-16 07:38:36 -07:00
nicolov
fbe3247772 Add GPT-neox model (#863) 2024-07-11 06:13:17 -07:00
Awni Hannun
538339b599 gemma2 (#855) 2024-06-27 10:06:28 -07:00
Chime Ogbuji
df6bc09d74 Configuration-based use of HF hub-hosted datasets for training (#701)
* Add hf_dataset configuration for using HF hub-hosted datasets for (Q)LoRA training

* Pre-commit formatting

* Fix YAML config example

* Print DS info

* Include name

* Add hf_dataset parameter default

* Remove TextHFDataset and CompletionsHFDataset and use Dataset and CompletionsDataset instead, adding a text_key constructor argument to the former (and changing it to work with a provided data structure instead of just from a JSON file), and prompt_key and completion_key arguments to the latter with defaults for backwards compatibility.

* nits

* update docs

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-26 10:20:50 -07:00
Awni Hannun
d8b073e3a7 Add eos token to lora fine-tunes (#818)
* add eos token to lora fine-tunes

* Comment
2024-06-12 07:44:21 -07:00
Derek Lewis
89b0b75250 GPT2 Support (#798)
* GPT-2 model support

* Add test for gpt2 model

* Fix weight sanitizing for quantization

* use approx gelu

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-02 16:33:20 -07:00
madroid
c457a3f88b LoRA: Extract small function (#614)
* LoRA: Extract pre_processing_model  function

* LoRA: Extract small functions(train_model,evaluate_model)

* move test case to test_tuner_utils.py

* nits

* nits

* remove extra param, validate at it 0

* version

* fix test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-02 06:38:42 -07:00
Chen Xin
aac98ca6f4 support internlm2 (#797)
* support internlm2

* only attention projections

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-27 06:22:21 -07:00
Prince Canuma
b044ce2acf Add support for ibm granite (#758)
* add support for granite 3-8B config

* add gpt_bigcode

* add positional embedding condition.

* add support for granite 3-8B config

* add gpt_bigcode

* add positional embedding condition.

* remove unused function

* rebase fix

* move position emebedding to mask creation

* add to tuner and format

* add support for granite 3-8B config

* add gpt_bigcode

* add positional embedding condition.

* add support for granite 3-8B config

* add gpt_bigcode

* add positional embedding condition.

* rebase fix

* move position emebedding to mask creation

* add to tuner and format

* refactor mask

* remove dropout layers
2024-05-21 20:16:31 -07:00
Awni Hannun
9fc6efbd90 version bump + some fixes (#792) 2024-05-21 20:09:35 -07:00
Angelos Katharopoulos
9f671228cd Block sparse MM MoEs (#782)
- Adds SwitchLinear
- Adds QuantizedSwitchLinear
2024-05-21 15:58:08 -07:00
alexC-nonsense4k
42458914c8 support dora finetune in mlx-examples/llms/mlx_lm (#779)
* support dora finetune

* solve problems in lora.py and tuner.utils.py

* add use_dora (bool) in functions of load adapters

* delete all unsupported quantization code and fix all the calculate problems in mlx_lm/tuner/dora.py

* Using stop_gradient to prevent gradients from flowing through ‘norm’ during backpropagation

* set DEFAULT_USE_DORA in mlx_lm/generate.py

* add annotation for all the use_dora

* mlx_lm/fuse.py support fuse dora layers and fix a bug of to_linear() in mlx_lm/tuner/dora.py

* simplify code of juding type of a fused layer in mlx_lm/fuse.py

* add use_dora in mlx_lm/fuse.py when apply_lora_layers()

* style + nits

* style + nits

* more updates

---------

Co-authored-by: chenyifei08 <chenyifei08@baidu.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-16 08:21:26 -07:00
Awni Hannun
6f0a69e682 fix lora for openelm (#773) 2024-05-10 09:51:41 -07:00
Anchen
f30413b63c chore(mlx-lm): fix the number of validation batches configuration. (#752)
* chore: fix number of validation batches

* clean up

* address comment
2024-05-04 06:52:42 -07:00
Prince Canuma
c012eb173f Add support for OpenELM (#719)
* add openELM

* update splitting logic

* update qkv logic and, transformer and MLP block

* code formatting and fix args

* fix array slicing and remove unused var :)

* add to tuner

* use mx.split for slicing qkv

* merge with phi3

* remove rope scaling logic

* code formatting
2024-04-25 16:49:28 -07:00
Gökdeniz Gülmez
2c1c9e9024 MiniCPM implementation (#685)
* Added support for the MiniCPM architecture

* Added support for the MiniCPM architecture

* Updated utils.py and LORA.md

* Updated utils.py and LORA.md

* Update implementation details for MiniCPM architecture

* Cleaning up

* fixed the missing lm.head layer problem

* Refactor Model class to dynamically handle tied and untied word embeddings

* Quick update

* added a dynamic rope scaling base calucaltion

* Added support for the MiniCPM architecture

* Added support for the MiniCPM architecture

* Updated utils.py and LORA.md

* Updated utils.py and LORA.md

* Update implementation details for MiniCPM architecture

* Cleaning up

* fixed the missing lm.head layer problem

* Refactor Model class to dynamically handle tied and untied word embeddings

* added a dynamic rope scaling base calucaltion

* quick fix and clean up

* clean up again

* removed the MiniCPMNorm class as its not used

* forgot something, sorry

* format

* version bump

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-25 15:29:28 -07:00
Prince Canuma
abcd891851 Add support for phi-3 (#712)
* Add phi-3 modelling

* fix rope scaling warning

* add tests and update tuner utils

* update name and remove sanitize

* fix lora
2024-04-23 09:20:00 -07:00
Awni Hannun
574ad7f6fe fix dequantization (#693) 2024-04-19 10:46:59 -07:00
Awni Hannun
2146bcd7ee Quantize embedding / Update quantize API (#680)
* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test
2024-04-18 18:16:10 -07:00
dmdaksh
7d7e236061 - Removed unused Python imports (#683)
- bert/model.py:10: tree_unflatten
  - bert/model.py:2: dataclass
  - bert/model.py:8: numpy
  - cifar/resnet.py:6: Any
  - clip/model.py:15: tree_flatten
  - clip/model.py:9: Union
  - gcn/main.py:8: download_cora
  - gcn/main.py:9: cross_entropy
  - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
  - llms/gguf_llm/models.py:9: numpy
  - llms/mixtral/mixtral.py:12: tree_map
  - llms/mlx_lm/models/dbrx.py:2: Dict, Union
  - llms/mlx_lm/tuner/trainer.py:5: partial
  - llms/speculative_decoding/decoder.py:1: dataclass, field
  - llms/speculative_decoding/decoder.py:2: Optional
  - llms/speculative_decoding/decoder.py:5: mlx.nn
  - llms/speculative_decoding/decoder.py:6: numpy
  - llms/speculative_decoding/main.py:2: glob
  - llms/speculative_decoding/main.py:3: json
  - llms/speculative_decoding/main.py:5: Path
  - llms/speculative_decoding/main.py:8: mlx.nn
  - llms/speculative_decoding/model.py:6: tree_unflatten
  - llms/speculative_decoding/model.py:7: AutoTokenizer
  - llms/tests/test_lora.py:13: yaml_loader
  - lora/lora.py:14: tree_unflatten
  - lora/models.py:11: numpy
  - lora/models.py:3: glob
  - speechcommands/kwt.py:1: Any
  - speechcommands/main.py:7: mlx.data
  - stable_diffusion/stable_diffusion/model_io.py:4: partial
  - whisper/benchmark.py:5: sys
  - whisper/test.py:5: subprocess
  - whisper/whisper/audio.py:6: Optional
  - whisper/whisper/decoding.py:8: mlx.nn
2024-04-16 07:50:32 -07:00
Awni Hannun
2bd64b78cf Save lora config (#636)
* lora config

* comments

* version bump
2024-04-02 13:52:53 -07:00
Prince Canuma
d661440dbb Add support for qwen2moe (#640)
* add sparsemoe block and update decoder logic

* update file name to match HF

* update name

* Code formatting

* update gates calculation

* add support for Qwen2MoE.

* fix pytest

* code formatting and fix missing comma in utils

* Remove decoder sparse step.

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>

* remove gate layer anti-quantisation

* remove unused argument

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-04-02 11:33:29 -07:00
Chime Ogbuji
f6283ef7ce Configurable LR schedulers (#604)
* Initial config handler and test

* Added means to run from CLI

* Update lora config loading and tests

* Constrain scheduler config (warmup and minimum LR) for each kind

* Update reference to moved schedule_config module

* Minor fix

* Fix typos

* Moved build_schedule and tests

* nits in schedule config

* flake

* fix path

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-29 13:41:10 -07:00
Awni Hannun
b80adbcc3e DBRX (#628)
* dbrx

* format

* format

* comments

* change scores slightly

* remove inadvertant import
2024-03-28 21:03:53 -07:00
Awni Hannun
bbfcc103d7 cast around lora adapters (#613) 2024-03-24 19:34:51 -07:00
Anchen
494cdf8e96 chore: fix loar for moe model (#608) 2024-03-23 07:22:11 -07:00
Ivan Fioravanti
d2a99172a6 Add dropout parameter to lora configuration (#599)
* Add dropout parameter to lora configuration

A dropout parameter has been added to the lora configuration settings in lora_config.yaml. The LoRALinear class in utils.py has been updated to take this new parameter. Additionally, a AttributeError: 'types.SimpleNamespace' object has no attribute 'prompt' related to `args.prompt` has been removed from lora.py.

* Update lora_config.yaml

Set dropout to 0.0 in the sample config file

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-20 08:44:40 -07:00
madroid
39d5ca6427 LoRA: report last train info (#595) 2024-03-19 17:29:50 -07:00
madroid
b0bcd86a40 Support for OpenAI’s fine-tuning dataset format (#548)
* LoRA: move load_dataset to tuner/datasets.py file

* LoRA: support OpenAI chat format datasets

see https://platform.openai.com/docs/guides/fine-tuning/example-format

* LoRA: support OpenAI completion format datasets

* LoRA: formatting dataset timing to reduce memory footprint

* Refactor dataset item access in PromptCompletionDataset

* Update mlx_lm/LORA.md

* Update mlx_lm/LORA.md

* check Unsupported data format

* add tests, fine-tune doc

* add tests, fine-tune doc

* add jinja2 for chat template

* nits in readme

* nits in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-19 16:45:46 -07:00
madroid
d4e1de1d5b add peak_memory info to training callback (#572) 2024-03-13 20:17:10 -07:00
Awni Hannun
14fe868825 version (#570) 2024-03-13 10:09:36 -07:00
Awni Hannun
39084e81c2 Some improvements to LoRA (#528)
* set cache_limit

* remove set cache_limit

* cleanup

* add gradient checkpointing

* fix sort

* mokey patch call for checkpoint

* fix example config
2024-03-12 20:02:03 -07:00
Chime Ogbuji
e56d9015ef LoRA on all linear transformer block layers (#546)
* Add --lora-all-linear option to apply LoRa to all linear transfer block layers

* Moved to YAML config and added specification of rank & alpha

* nits in conifg, more tests

* nit

* run tests for prs

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-12 07:37:40 -07:00
Awni Hannun
ad3cf5ed98 dropout 0 as default (#549) 2024-03-08 13:07:10 -08:00
Muhtasham Oblokulov
81e2a80026 Add Starcoder 2 (#502)
* Add Starcoder2 model and update utils.py

* Refactor model arguments and modules in starcoder2.py

* Refactor FeedForward class to MLP in starcoder2.py

* Fix typo

* pre-commit

* Refactor starcoder2.py: Update model arguments and modules

* Fix LM head and MLP layers

* Rename  input layer norm

* Update bias in linear layers

* Refactor token embeddings in Starcoder2Model

* Rename to standard HF attention layer name

* Add LayerNorm

* Add transposed token embeddings (like in Gemma)

* Refactor MLP and TransformerBlock classes

* Add tie_word_embeddings option to ModelArgs and update Model implementation

* Add conditional check for tying word embeddings in Starcoder2Model

* Fix bias in lm_head linear layer

* Remove unused LayerNorm in stablelm

* Update transformers dependency to use GitHub repository

* fix lm head bug, revert transformer req

* Update RoPE initialization in Attention class

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-02 19:39:23 -08:00
Ashish
261f1280f6 Update to StableLM code (#514)
* StableLM now part of Transformers as stablelm rather than stablelm_epoch; changed config to match new changes

* removing old file

* reference new stablelm
2024-03-01 09:53:38 -08:00
Madroid Ma
f03c8a7b44 LoRA: adapter file Support path information (#505)
* LoRA: adapter file Support path information

* fix pre-commit lint

* from os.path to pathlib.Path

* Update llms/mlx_lm/tuner/trainer.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* rename check_checkpoints_path to checkpoints_path

* fix pre-commit lint

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-02-29 22:20:49 -08:00
Awni Hannun
ab9172baac Gemma support (#474)
* gemma support

* format

* lora support for gemma
2024-02-21 08:47:13 -08:00
Madroid Ma
8eee4399f4 LoRA: Add printing and callbacks for learning rate during training (#457)
* LoRA:Refactor TrainingCallback to enhance flexibility and extensibility

This commit refactors the TrainingCallback class to accept a dictionary parameter for both on_train_loss_report and on_val_loss_report methods. By switching from multiple parameters to a single dict parameter, this change significantly improves the class's flexibility and makes it easier to extend with new training or validation metrics in the future without altering the method signatures. This approach simplifies the addition of new information to be logged or processed and aligns with best practices for scalable and maintainable code design.

* LoRA: Add printing and callbacks for learning rate during training
2024-02-20 13:07:21 -08:00
Awni Hannun
e4d5630698 Basic CircleCI (#449)
* basic style checks for circleci

* format

* fix config
2024-02-16 22:13:55 -08:00
Madroid Ma
0ba466369f LoRA: add training callbacks (#414)
* LoRA: add training callbacks

* LoRA: add trained tokens print & callback
2024-02-16 06:04:57 -08:00
Madroid Ma
726b1ddec0 fix: check LoRA layers number error (#446) 2024-02-16 06:03:33 -08:00
Chime Ogbuji
e446598f62 Passing parameterized loss and batching to trainer (#391) 2024-02-13 07:03:25 -08:00