Commit Graph

537 Commits

Author SHA1 Message Date
Awni Hannun
e92de216fd rid warning (#789) 2024-05-20 06:05:33 -07:00
alexC-nonsense4k
42458914c8 support dora finetune in mlx-examples/llms/mlx_lm (#779)
* support dora finetune

* solve problems in lora.py and tuner.utils.py

* add use_dora (bool) in functions of load adapters

* delete all unsupported quantization code and fix all the calculate problems in mlx_lm/tuner/dora.py

* Using stop_gradient to prevent gradients from flowing through ‘norm’ during backpropagation

* set DEFAULT_USE_DORA in mlx_lm/generate.py

* add annotation for all the use_dora

* mlx_lm/fuse.py support fuse dora layers and fix a bug of to_linear() in mlx_lm/tuner/dora.py

* simplify code of juding type of a fused layer in mlx_lm/fuse.py

* add use_dora in mlx_lm/fuse.py when apply_lora_layers()

* style + nits

* style + nits

* more updates

---------

Co-authored-by: chenyifei08 <chenyifei08@baidu.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-16 08:21:26 -07:00
Awni Hannun
69181e0058 Support non incremental kv cache growth (#766) 2024-05-15 12:56:24 -07:00
Jinwu Zhan
1a86d985d9 Support --add_eos_token argument within Lora training (#760)
* Support `--add_eos_token` argument to empower users to control the addition of the eos token during LoRA training, addressing issues like incomplete text generation.

* Support `--add_eos_token`, code format

---------

Co-authored-by: Zhan ChengLong <zhanchenglong@bytedance.com>
2024-05-13 17:17:42 -07:00
JosefAlbers
10853b57d9 Add model_config parameter to load() and load_model() (#770)
* Add `model_config` parameter to `load()` and `load_model()`

For easy editing of the loaded model configuration (e.g., for changing RoPE theta or scaling of Phi-3 model)

Example:

```python
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed", model_config={"rope_theta":50000.0})
response = generate(model, tokenizer, prompt, max_tokens=MAX_TOKENS)
```

* Possible bug (default_loss)

* Revert "Possible bug (default_loss)"

This reverts commit 70a55ace18.

* Fix default_loss for lora

* 1. move load_model's new optional `model_config` arg to the end (fetch_from_hub()'s `model = load_model(model_path, lazy)`) 2. fix indentations (`black` hook)
2024-05-10 10:13:34 -07:00
Awni Hannun
6f0a69e682 fix lora for openelm (#773) 2024-05-10 09:51:41 -07:00
Awni Hannun
fad9598372 Fix llama cache check (#763)
* fix llama cache check

* add test
2024-05-08 08:35:54 -07:00
Awni Hannun
ee60e2a9d5 Kv cache (#643)
* in place kv_cache

* fix

* fix kv cache size

* partially fix kv cache dtype

* step kv cache

* multiple of step size

* more teests + kv cache

* more kv cache

* udpate all models to use kv cache
2024-05-08 08:18:13 -07:00
Albert Avetisian
bfbc0e434a Add optional EOS token for llava example (#753)
* add optional EOS token

* add tokenizer config to align with MLX LM example

* formtatting fixes
2024-05-08 06:04:36 -07:00
Kevin Wang
c0019c4908 Pad mask with zeros for non-square attention matrices (#715)
* Pad mask with zeros for non-square attention matrices

The current implementation of the mask assumes the attention matrix is square, which is true if there is no cache. However, if one wishes to produce multiple tokens at a time, such as in speculative decoding implementations, a rectangular mask is necessary.

This change pads the bottom of the mask with zeros so multi-token decoding with a cache works correctly.

* Directly create mask instead of padding

* Update llama.py
2024-05-04 16:32:25 -07:00
Anchen
f30413b63c chore(mlx-lm): fix the number of validation batches configuration. (#752)
* chore: fix number of validation batches

* clean up

* address comment
2024-05-04 06:52:42 -07:00
Awni Hannun
2bf11c4633 Use stable url for MNIST (#749)
* use stable url

* remove deprecated flag
2024-05-03 17:13:05 -07:00
Konstantin Kerekovski
d1c35fa684 Add MLX Cache Limit setting for mlx_lm.generate and mlx_lm.server CLI (#744)
* Add support for setting MLX cache limit in GB

* Add support for setting MLX cache limit in GB in mlx_lm.server

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-03 12:42:48 -07:00
Ivan Fioravanti
b468091f7f Add model management functionality for local caches (#736)
* Add model management functionality for local caches

This commit introduces a set of command-line utilities for managing MLX models downloaded and saved locally in Hugging Face cache. The functionalities include scanning existing models, retrieving detailed information about a specific model, and deleting a model by its name.

* Added mlx_lm.model to setup.py

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-03 12:20:13 -07:00
Awni Hannun
92430df0a0 Fix lora for qwen moe (#743)
* fix lora for qwen moe

* use max seq length in test as well
2024-05-02 21:55:09 -07:00
madroid
5079af62db Update model card describe (#654)
* Update model card describe

- Add full link jump
- Add the address of the model uploader's Hugging Face homepage

* Add user_info to reduce whoami calls

* Remove the -U argument

* remove HF user info

* run pre-commit
2024-05-02 21:22:04 -07:00
madroid
6775d6cb3f Whisper: Add pip distribution configuration to support pip installations. (#739)
* Whisper: rename whisper to mlx_whisper

* Whisper: add setup.py config for publish

* Whisper: add assets data to setup config

* Whisper: pre-commit for setup.py

* Whisper: Update README.md

* Whisper: Update README.md

* nits

* fix package data

* nit in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-01 09:00:02 -07:00
Karim Elmaaroufi
4bf2eb17f2 Validate server params & fix logit bias bug (#731)
* Bug fix in logit bias

* Add parameter validations

* Fix typo

* Update docstrings to match MLX styling

* Black style + fix a validation bug
2024-04-30 07:27:40 -07:00
Jaward Sesay
7c0962f4e2 Add Supported Quantized Phi-3-mini-4k-instruct gguf Weight (#717)
* support for phi-3 4bits quantized gguf weights

* Added link to 4 bits quantized model

* removed some prints

* Added correct comment

* Added correct comment

* removed print

Since last condition already prints warning for when quantization is None
2024-04-29 20:11:32 -07:00
Thomas Lazarus
5513c4e57d Fixes Typo in Starcoder2 (#740) 2024-04-29 13:14:45 -07:00
Javier de la Rosa
510d2bde49 Force multi_commits when uploading to HF (#729) 2024-04-28 19:07:17 -07:00
锦此
699de35b03 Update lora_config.yaml (#735)
Update LoRa config YAML, replacing the adapter file argument with the adapter path argument.
2024-04-28 10:24:34 -07:00
Prince Canuma
c012eb173f Add support for OpenELM (#719)
* add openELM

* update splitting logic

* update qkv logic and, transformer and MLP block

* code formatting and fix args

* fix array slicing and remove unused var :)

* add to tuner

* use mx.split for slicing qkv

* merge with phi3

* remove rope scaling logic

* code formatting
2024-04-25 16:49:28 -07:00
Gökdeniz Gülmez
2c1c9e9024 MiniCPM implementation (#685)
* Added support for the MiniCPM architecture

* Added support for the MiniCPM architecture

* Updated utils.py and LORA.md

* Updated utils.py and LORA.md

* Update implementation details for MiniCPM architecture

* Cleaning up

* fixed the missing lm.head layer problem

* Refactor Model class to dynamically handle tied and untied word embeddings

* Quick update

* added a dynamic rope scaling base calucaltion

* Added support for the MiniCPM architecture

* Added support for the MiniCPM architecture

* Updated utils.py and LORA.md

* Updated utils.py and LORA.md

* Update implementation details for MiniCPM architecture

* Cleaning up

* fixed the missing lm.head layer problem

* Refactor Model class to dynamically handle tied and untied word embeddings

* added a dynamic rope scaling base calucaltion

* quick fix and clean up

* clean up again

* removed the MiniCPMNorm class as its not used

* forgot something, sorry

* format

* version bump

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-25 15:29:28 -07:00
Awni Hannun
685012c2ad Couple fixes for LoRA (#711)
* don't overwrite in test only mode

* only load model specific safetensors
2024-04-25 14:16:13 -07:00
Kristian Muñiz
109ee2f2f8 Use CORS headers for streaming for MLX Server (#716) 2024-04-25 07:26:04 -07:00
Kevin Wang
8a265f0d54 Fix incorrect type annotation (#720)
A `Tuple` is missing in this type annotation.
2024-04-24 15:52:43 -07:00
Prince Canuma
abcd891851 Add support for phi-3 (#712)
* Add phi-3 modelling

* fix rope scaling warning

* add tests and update tuner utils

* update name and remove sanitize

* fix lora
2024-04-23 09:20:00 -07:00
Awni Hannun
ecbc6ff1e3 one more quant fix (#708) 2024-04-22 18:12:52 -07:00
Aaron Ng
8d5cf5b0c8 use logging in mlx server (#705) 2024-04-22 07:50:06 -07:00
AlexandrosChrtn
f20e68fcc0 Load fused model with transformers (#703)
* save format for transformers compatibility

* save format for transformers compatibility arg

* hardcode mlx

* hardcode mlx format
2024-04-21 09:04:44 -07:00
Anchen
749cabf299 fix: unicode decoding (#702) 2024-04-21 08:58:23 -07:00
Karim Elmaaroufi
1484598de1 Add support for logit bias (#697) 2024-04-21 06:53:56 -07:00
Awni Hannun
6abdbe3be8 Fix quant in gguf (#698)
* fix quant in gguf

* fix whisper
2024-04-19 20:07:11 -07:00
Awni Hannun
574ad7f6fe fix dequantization (#693) 2024-04-19 10:46:59 -07:00
Awni Hannun
2146bcd7ee Quantize embedding / Update quantize API (#680)
* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test
2024-04-18 18:16:10 -07:00
Anchen
f5f189e48a fix(mlx-lm): broken server.py (#690)
* fix server.py

* fix var referenced before assignment

* add test

* clean up
2024-04-18 14:26:18 -07:00
Phúc H. Lê Khắc
35206806ac Create executables for generate, lora, server, merge, convert (#682)
* feat: create executables mlx_lm.<cmd>

* nits in docs

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-16 16:08:49 -07:00
dmdaksh
7d7e236061 - Removed unused Python imports (#683)
- bert/model.py:10: tree_unflatten
  - bert/model.py:2: dataclass
  - bert/model.py:8: numpy
  - cifar/resnet.py:6: Any
  - clip/model.py:15: tree_flatten
  - clip/model.py:9: Union
  - gcn/main.py:8: download_cora
  - gcn/main.py:9: cross_entropy
  - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
  - llms/gguf_llm/models.py:9: numpy
  - llms/mixtral/mixtral.py:12: tree_map
  - llms/mlx_lm/models/dbrx.py:2: Dict, Union
  - llms/mlx_lm/tuner/trainer.py:5: partial
  - llms/speculative_decoding/decoder.py:1: dataclass, field
  - llms/speculative_decoding/decoder.py:2: Optional
  - llms/speculative_decoding/decoder.py:5: mlx.nn
  - llms/speculative_decoding/decoder.py:6: numpy
  - llms/speculative_decoding/main.py:2: glob
  - llms/speculative_decoding/main.py:3: json
  - llms/speculative_decoding/main.py:5: Path
  - llms/speculative_decoding/main.py:8: mlx.nn
  - llms/speculative_decoding/model.py:6: tree_unflatten
  - llms/speculative_decoding/model.py:7: AutoTokenizer
  - llms/tests/test_lora.py:13: yaml_loader
  - lora/lora.py:14: tree_unflatten
  - lora/models.py:11: numpy
  - lora/models.py:3: glob
  - speechcommands/kwt.py:1: Any
  - speechcommands/main.py:7: mlx.data
  - stable_diffusion/stable_diffusion/model_io.py:4: partial
  - whisper/benchmark.py:5: sys
  - whisper/test.py:5: subprocess
  - whisper/whisper/audio.py:6: Optional
  - whisper/whisper/decoding.py:8: mlx.nn
2024-04-16 07:50:32 -07:00
Angelos Katharopoulos
e55a9e8cb4 Add an SPM detokenizer that doesn't trim initial space (#681) 2024-04-15 14:15:25 -07:00
Awni Hannun
d3f8e4aee9 Fix argpartition call in Mixtral and other MOES (#676)
* Update mixtral.py

* fix all moes

---------

Co-authored-by: yuhai-china <yuhai.china@gmail.com>
2024-04-12 11:00:56 -07:00
Awni Hannun
9c5554d8ee Use async eval (#670)
* Use async eval

* bump

* bump

* remove workaround for bfloat cumsum
2024-04-11 13:18:23 -07:00
Nripesh Niketan
0250f6f38e feat: Update black-pre-commit-mirror to version 24.3.0 (#675) 2024-04-11 07:28:26 -07:00
devonthomas35
9f472dc985 Update transformers for ⌘-R+ (#668) 2024-04-11 07:28:12 -07:00
da-z
5a4cad34ef Always resume downloads (#674)
* Always resume downloads

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-11 06:52:32 -07:00
Angelos Katharopoulos
eff6690952 Fix CFG for SDXL (#667) 2024-04-09 06:06:41 -07:00
Angelos Katharopoulos
1278994b56 Add streaming detokenizers (#651) 2024-04-08 22:36:01 -07:00
Awni Hannun
c68aa3c7c3 Stable lm 2 (#666)
* stable lm 2

* test and lora

* version bump

* merge stable models
2024-04-08 14:18:55 -07:00
Awni Hannun
1e2f7f50b6 fix for empty initial string (#665) 2024-04-08 10:40:05 -07:00
Awni Hannun
c386dd5f5a Fix for cohere plus (#650)
* fix for cohere plus

* version bump
2024-04-05 14:11:24 -07:00