Commit Graph

86 Commits

Author SHA1 Message Date
ilyasch2
3b526f0aa1
Add support for falcon-mamba (#1074)
* Add support for falcon-mamba

* nits

* nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-11-04 12:23:30 -08:00
Awni Hannun
e510987870
Clear cache every now and then (#1081)
* clear cache every now and then

* don't need user arg anymore
2024-11-01 14:15:32 -07:00
Alex Barron
85ffd2c96a
Quantized KV Cache (#1075)
* add QuantizedKVCache

* simplify

* add tests

* single sdpa function

* fix sed

* in place

* fix tests

* support different k and v head dims
2024-10-31 16:59:52 -07:00
Awni Hannun
9f34fdbda4
Wire models in MLX LM (#1069)
* wired in MLX LM

* fix synch

* comment + nit

* version

* mlx lm version

* bump to 0.19.2
2024-10-31 08:17:14 -07:00
Awni Hannun
66e7bcb886
override dtype with quant (#1062) 2024-10-22 09:56:45 -07:00
Awni Hannun
c799133998
Make llm async eval less brittle (#1040)
* Make llm async eval less brittle

* nit
2024-10-14 10:25:24 -07:00
Awni Hannun
4360e7ccec
clear cache during prompt processing (#1027) 2024-10-09 16:48:32 -07:00
Awni Hannun
b7373cb44f
fix long prompt generations (#1023) 2024-10-09 11:09:36 -07:00
Awni Hannun
fca087be49
More cache improvements (#1015)
* fix rotating kv cache for chat use case

* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat

* nit in chat

* fix tests

* fix tests

* fix tests

* docs

* chat command

* comments + docs

* Define meta_state on all Cache implementations

* fixes + trim_prompt_cache api

* fix default model

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00
nathan
0866e23a67
repetiton_penalty and logits_bias just using logits_processors (#1004)
* refactor of repetition_penalty and logits_bias to use logits_processor

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:49:03 -07:00
Gökdeniz Gülmez
50e5ca81a8
Adding full finetuning (#903)
* Adding full model weights finetuning

* Updating the LORA.md and ACKNOWLEDGMENTS.md files.

* removing --use-dora and --fulll-training and adding --fine-tune-type

* some clean up

* reformating and fixing dora training

* updated CONFIG_DEFAULTS

* update config example

* update in the config example fie

* Update LORA.md

* merge and commit

* adding argument for dora linear layer

* clean up

* clean up in the example yaml file

* fix

* final fix before sending

* small addition to re md file

* fix for loading the fully trained model by saving all the files and configs correctly

* clean up

* removing the unnesesairy files

* changing lora layers back to 16

* removed max file size

* nits

* resolve merge

* some consistency changes

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-29 17:12:47 -07:00
nathan
ace2bb5890
Add logits_processor option to generate_step function (#983)
* Add logits_processor option for the generation as in huggingface transformers library

* concatenation correction

* Rename the tokens variable for clarity

* remove the logit_bias argument from generate_step method

* fix the variable name

* nits + test

* test

* add back logit bias + test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:08:49 -07:00
Awni Hannun
f530f56df2
don't use internal exception (#990) 2024-09-17 16:22:48 -07:00
Awni Hannun
6c2369e4b9
Fix bug in upload + docs nit (#981)
* fix bug in upload + docs nit

* nit
2024-09-07 14:46:57 -07:00
Awni Hannun
c3e3411756
Update LLM generation docs to use chat template (#973)
* fix docs

* add template to model cards as well

* revert

* version
2024-09-07 06:06:15 -07:00
madroid
bd29aec299
Support HuggingFace model tree (#957)
* Hub: Update quantization configuration fields

* Hub: add base_model metadata

* Hub: add quantization_config for model tree Quantized type

* Hub: update quantization_config value

* Hub: remove config print
2024-09-04 06:19:32 -07:00
Angelos Katharopoulos
1003a8b2dd
Add the ability to load the KV cache from a file (#956) 2024-08-28 22:11:45 -07:00
Awni Hannun
7be292c0c9
Handle longer prompt/generation (#931)
* rebase

* nits

* nit

* fix rotating cache with step prefill

* update version
2024-08-16 15:28:39 -07:00
Chime Ogbuji
c50971e860
Min P implementation (#926)
* Min P implementation

* Change default to 0 (no min_p)

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-15 15:45:02 -07:00
Awni Hannun
9b83004631
Faster sampling with mx.compile (#937)
* faster sampling with compile

* fix test
2024-08-15 11:29:09 -07:00
Awni Hannun
95840f32e2
Fix whipser conversion for safetensors models (#935)
* fix whipser conversion for safetensor only. error in mlx lm for existing paths

* fix tests
2024-08-14 10:22:04 -07:00
Anchen
7a3ab1620a
support load model by custom get_model_classes (#899)
* feature(mlx_lm): support load model by custom get classes

* rename the param
2024-07-25 11:01:17 -07:00
Awni Hannun
20e221f7f7
Add recurrent gemma (#856)
* add recurrent gemma

* fix window cache
2024-07-07 12:10:04 -07:00
Chime Ogbuji
1d701a1831
Logprobs info to completion API (#806)
* Initial implementation

* Fix handling of return_step_logits in return

* Fixed OpenAI parameter expectations and logprob structure and datatypes

* pre-commit black formatting

* Remove unused parameter

* fix log probs

* fix colorize

* nits in server

* nits in server

* Fix top_logprobs structure (a dict) and include tokens in logprobs response

* nits

* fix types

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-23 10:35:13 -07:00
Michał Kurc
43d6deb3c1
mlx_lm: Add Streaming Capability to Generate Function (#807)
* Add streaming feature to text generation function

* separate stream and regular functions

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-03 09:04:39 -07:00
Awni Hannun
ca7ce60c91
Rename block sparse to gather (#793)
* rename block sparse to gather

* pin mlx version
2024-05-23 19:47:35 -07:00
Angelos Katharopoulos
9f671228cd
Block sparse MM MoEs (#782)
- Adds SwitchLinear
- Adds QuantizedSwitchLinear
2024-05-21 15:58:08 -07:00
AtakanTekparmak
199df9e110
fix: Added dedicated error handling to load and get_model_path (#775)
* fix: Added dedicated error handling to load and get_model_path

Added proper error handling to load and get_model_path by adding a dedicated exception class, because when the local path is not right, it still throws the huggingface RepositoryNotFoundError

* fix: Changed error message and resolved lack of import

* fix: Removed redundant try-catch block

* nits in message

* nits in message

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-20 06:39:05 -07:00
JosefAlbers
10853b57d9
Add model_config parameter to load() and load_model() (#770)
* Add `model_config` parameter to `load()` and `load_model()`

For easy editing of the loaded model configuration (e.g., for changing RoPE theta or scaling of Phi-3 model)

Example:

```python
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed", model_config={"rope_theta":50000.0})
response = generate(model, tokenizer, prompt, max_tokens=MAX_TOKENS)
```

* Possible bug (default_loss)

* Revert "Possible bug (default_loss)"

This reverts commit 70a55ace18.

* Fix default_loss for lora

* 1. move load_model's new optional `model_config` arg to the end (fetch_from_hub()'s `model = load_model(model_path, lazy)`) 2. fix indentations (`black` hook)
2024-05-10 10:13:34 -07:00
Awni Hannun
ee60e2a9d5
Kv cache (#643)
* in place kv_cache

* fix

* fix kv cache size

* partially fix kv cache dtype

* step kv cache

* multiple of step size

* more teests + kv cache

* more kv cache

* udpate all models to use kv cache
2024-05-08 08:18:13 -07:00
Awni Hannun
2bf11c4633
Use stable url for MNIST (#749)
* use stable url

* remove deprecated flag
2024-05-03 17:13:05 -07:00
madroid
5079af62db
Update model card describe (#654)
* Update model card describe

- Add full link jump
- Add the address of the model uploader's Hugging Face homepage

* Add user_info to reduce whoami calls

* Remove the -U argument

* remove HF user info

* run pre-commit
2024-05-02 21:22:04 -07:00
Karim Elmaaroufi
4bf2eb17f2
Validate server params & fix logit bias bug (#731)
* Bug fix in logit bias

* Add parameter validations

* Fix typo

* Update docstrings to match MLX styling

* Black style + fix a validation bug
2024-04-30 07:27:40 -07:00
Javier de la Rosa
510d2bde49
Force multi_commits when uploading to HF (#729) 2024-04-28 19:07:17 -07:00
Awni Hannun
685012c2ad
Couple fixes for LoRA (#711)
* don't overwrite in test only mode

* only load model specific safetensors
2024-04-25 14:16:13 -07:00
Karim Elmaaroufi
1484598de1
Add support for logit bias (#697) 2024-04-21 06:53:56 -07:00
Awni Hannun
2146bcd7ee
Quantize embedding / Update quantize API (#680)
* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test
2024-04-18 18:16:10 -07:00
Anchen
f5f189e48a
fix(mlx-lm): broken server.py (#690)
* fix server.py

* fix var referenced before assignment

* add test

* clean up
2024-04-18 14:26:18 -07:00
Awni Hannun
9c5554d8ee
Use async eval (#670)
* Use async eval

* bump

* bump

* remove workaround for bfloat cumsum
2024-04-11 13:18:23 -07:00
da-z
5a4cad34ef
Always resume downloads (#674)
* Always resume downloads

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-11 06:52:32 -07:00
Angelos Katharopoulos
1278994b56
Add streaming detokenizers (#651) 2024-04-08 22:36:01 -07:00
Awni Hannun
1e2f7f50b6
fix for empty initial string (#665) 2024-04-08 10:40:05 -07:00
Awni Hannun
2bd64b78cf
Save lora config (#636)
* lora config

* comments

* version bump
2024-04-02 13:52:53 -07:00
Awni Hannun
78c431dc25
cleanup whisper a little (#639) 2024-03-30 13:13:58 -07:00
Awni Hannun
5a52899405
Partially stream de-tokenization (#609)
* partially stream de-tokenization

* don't break full response
2024-03-23 15:32:33 -07:00
Anchen
fbed720d6f
chore(mlx-lm): fix the top_p implementation. (#602)
* chore(mlx-lm): clean up the top p imp

* chore: clean up

* chore: add test

* chore: address comments

* chore: clean up docs string

* chore: clean up test
2024-03-21 12:18:23 -07:00
Alwin Arrasyid
6c3d4c8ba2
add dequantize option to mlx_lm/convert.py (#547) 2024-03-19 19:50:08 -07:00
Chime Ogbuji
6f2fd5daea
Add mlx-lm version information to HF model card (#596)
* Add mlx-lm version informatiohn to HF model card

* Update llms/mlx_lm/utils.py

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

* Reverted indentation

* Pre-commit formatting

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-03-19 19:42:03 -07:00
Awni Hannun
e4b19bb9e1
Make attention faster for a some models (#574)
* make attention faster for a couple models

* remove unused generation flags

* add comment on lora

* include text files as well
2024-03-14 21:35:54 -07:00
Sugato Ray
2cd793dd69
feat: add update_config functionality (#531)
* feat: add `update_config` finctionality

- sorts the config for better readability
- updates "_name_or_path" key in config with upload_repo
- sets indentation of 4 spaces
- allows adding other key-value pairs via kwargs
- reduces code duplication
- standardizes config-update across mlx-lm

* feat: standardize updating config

Impactes:
- fuse.py
- merge.py

* update formatting

* remove commented out code

* update func: update_config to save_config

- drop kwards
- rename func as save_config
- incorporate review suggestions

* update func: save_config

- ensure only config-saving functionality
- function oes not return config as a dict anymore
- added review suggestions

* fixed formatting

* update formatting instruction in contribution guide

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-14 06:36:05 -07:00