Commit Graph

95 Commits

Author SHA1 Message Date
madroid
12083c4b7e
Support for multiple EOS tokens (#1141)
* Support for multiple EOS tokens

* Change _eos_token_ids type from list to set

* Remove model_config & add eos_token_id

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 08:53:58 -08:00
Alex Barron
2211b27388
Mixed Quantizations (#1132)
* saving/loading mixed quantizations

* comment

* add bits per weight

* more concise bpw

* count bias too
2024-12-08 14:21:50 -08:00
Awni Hannun
1963df8565
Allow prompt callback to generate_step (#1133)
* allow prompt callback and use in cache_prompt

* nit

* comments

* bump version
2024-12-03 16:17:14 -08:00
Neil Mehta
cefe793ae0
Accept mx.array type for prompt argument for stream_generate (#1125)
* Accept mx.array type for prompt argument for stream_generate

* Fix formatting
2024-11-26 16:51:55 -08:00
Awni Hannun
cfc29c29f4
Put prompt processing in same stream (#1122)
* put prompt processing in same stream

* patch
2024-11-25 09:47:00 -08:00
madroid
a5e173802e
docs: update stream_generate return type annotation (#1121)
Improve documentation clarity by:
1. Fix return type annotation to correctly reflect GenerationResponse
2. Simplify docstring by referencing GenerationResponse class
3. Remove redundant field descriptions
2024-11-25 08:10:14 -08:00
Awni Hannun
0f135396ae
Generation refactor: part 2 (#1099)
* unify with stream_generate

* fixes

* nit

* some cleanup, warnings, tests

* fix test + faster min p + test

* version
2024-11-23 11:47:06 -08:00
Alban Lecocq
bd6d910ca3
[MLX LM] Fix f-string formatting in memory warning message (#1105)
* Fix missing f-prefix for string interpolation in model size warning
* Ensures proper display of memory values in MB for model and max size
2024-11-13 06:14:03 -08:00
Awni Hannun
657b4cc0aa
[MLX LM] Sampler refactor + a few improvements (#1094)
* starting

* refactor sampler/processor and a few improvements

* fix stream

* fix stream generate

* fix eos handling in stream generate
2024-11-07 16:15:24 -08:00
ilyasch2
3b526f0aa1
Add support for falcon-mamba (#1074)
* Add support for falcon-mamba

* nits

* nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-11-04 12:23:30 -08:00
Awni Hannun
e510987870
Clear cache every now and then (#1081)
* clear cache every now and then

* don't need user arg anymore
2024-11-01 14:15:32 -07:00
Alex Barron
85ffd2c96a
Quantized KV Cache (#1075)
* add QuantizedKVCache

* simplify

* add tests

* single sdpa function

* fix sed

* in place

* fix tests

* support different k and v head dims
2024-10-31 16:59:52 -07:00
Awni Hannun
9f34fdbda4
Wire models in MLX LM (#1069)
* wired in MLX LM

* fix synch

* comment + nit

* version

* mlx lm version

* bump to 0.19.2
2024-10-31 08:17:14 -07:00
Awni Hannun
66e7bcb886
override dtype with quant (#1062) 2024-10-22 09:56:45 -07:00
Awni Hannun
c799133998
Make llm async eval less brittle (#1040)
* Make llm async eval less brittle

* nit
2024-10-14 10:25:24 -07:00
Awni Hannun
4360e7ccec
clear cache during prompt processing (#1027) 2024-10-09 16:48:32 -07:00
Awni Hannun
b7373cb44f
fix long prompt generations (#1023) 2024-10-09 11:09:36 -07:00
Awni Hannun
fca087be49
More cache improvements (#1015)
* fix rotating kv cache for chat use case

* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat

* nit in chat

* fix tests

* fix tests

* fix tests

* docs

* chat command

* comments + docs

* Define meta_state on all Cache implementations

* fixes + trim_prompt_cache api

* fix default model

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00
nathan
0866e23a67
repetiton_penalty and logits_bias just using logits_processors (#1004)
* refactor of repetition_penalty and logits_bias to use logits_processor

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:49:03 -07:00
Gökdeniz Gülmez
50e5ca81a8
Adding full finetuning (#903)
* Adding full model weights finetuning

* Updating the LORA.md and ACKNOWLEDGMENTS.md files.

* removing --use-dora and --fulll-training and adding --fine-tune-type

* some clean up

* reformating and fixing dora training

* updated CONFIG_DEFAULTS

* update config example

* update in the config example fie

* Update LORA.md

* merge and commit

* adding argument for dora linear layer

* clean up

* clean up in the example yaml file

* fix

* final fix before sending

* small addition to re md file

* fix for loading the fully trained model by saving all the files and configs correctly

* clean up

* removing the unnesesairy files

* changing lora layers back to 16

* removed max file size

* nits

* resolve merge

* some consistency changes

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-29 17:12:47 -07:00
nathan
ace2bb5890
Add logits_processor option to generate_step function (#983)
* Add logits_processor option for the generation as in huggingface transformers library

* concatenation correction

* Rename the tokens variable for clarity

* remove the logit_bias argument from generate_step method

* fix the variable name

* nits + test

* test

* add back logit bias + test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:08:49 -07:00
Awni Hannun
f530f56df2
don't use internal exception (#990) 2024-09-17 16:22:48 -07:00
Awni Hannun
6c2369e4b9
Fix bug in upload + docs nit (#981)
* fix bug in upload + docs nit

* nit
2024-09-07 14:46:57 -07:00
Awni Hannun
c3e3411756
Update LLM generation docs to use chat template (#973)
* fix docs

* add template to model cards as well

* revert

* version
2024-09-07 06:06:15 -07:00
madroid
bd29aec299
Support HuggingFace model tree (#957)
* Hub: Update quantization configuration fields

* Hub: add base_model metadata

* Hub: add quantization_config for model tree Quantized type

* Hub: update quantization_config value

* Hub: remove config print
2024-09-04 06:19:32 -07:00
Angelos Katharopoulos
1003a8b2dd
Add the ability to load the KV cache from a file (#956) 2024-08-28 22:11:45 -07:00
Awni Hannun
7be292c0c9
Handle longer prompt/generation (#931)
* rebase

* nits

* nit

* fix rotating cache with step prefill

* update version
2024-08-16 15:28:39 -07:00
Chime Ogbuji
c50971e860
Min P implementation (#926)
* Min P implementation

* Change default to 0 (no min_p)

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-15 15:45:02 -07:00
Awni Hannun
9b83004631
Faster sampling with mx.compile (#937)
* faster sampling with compile

* fix test
2024-08-15 11:29:09 -07:00
Awni Hannun
95840f32e2
Fix whipser conversion for safetensors models (#935)
* fix whipser conversion for safetensor only. error in mlx lm for existing paths

* fix tests
2024-08-14 10:22:04 -07:00
Anchen
7a3ab1620a
support load model by custom get_model_classes (#899)
* feature(mlx_lm): support load model by custom get classes

* rename the param
2024-07-25 11:01:17 -07:00
Awni Hannun
20e221f7f7
Add recurrent gemma (#856)
* add recurrent gemma

* fix window cache
2024-07-07 12:10:04 -07:00
Chime Ogbuji
1d701a1831
Logprobs info to completion API (#806)
* Initial implementation

* Fix handling of return_step_logits in return

* Fixed OpenAI parameter expectations and logprob structure and datatypes

* pre-commit black formatting

* Remove unused parameter

* fix log probs

* fix colorize

* nits in server

* nits in server

* Fix top_logprobs structure (a dict) and include tokens in logprobs response

* nits

* fix types

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-23 10:35:13 -07:00
Michał Kurc
43d6deb3c1
mlx_lm: Add Streaming Capability to Generate Function (#807)
* Add streaming feature to text generation function

* separate stream and regular functions

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-03 09:04:39 -07:00
Awni Hannun
ca7ce60c91
Rename block sparse to gather (#793)
* rename block sparse to gather

* pin mlx version
2024-05-23 19:47:35 -07:00
Angelos Katharopoulos
9f671228cd
Block sparse MM MoEs (#782)
- Adds SwitchLinear
- Adds QuantizedSwitchLinear
2024-05-21 15:58:08 -07:00
AtakanTekparmak
199df9e110
fix: Added dedicated error handling to load and get_model_path (#775)
* fix: Added dedicated error handling to load and get_model_path

Added proper error handling to load and get_model_path by adding a dedicated exception class, because when the local path is not right, it still throws the huggingface RepositoryNotFoundError

* fix: Changed error message and resolved lack of import

* fix: Removed redundant try-catch block

* nits in message

* nits in message

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-20 06:39:05 -07:00
JosefAlbers
10853b57d9
Add model_config parameter to load() and load_model() (#770)
* Add `model_config` parameter to `load()` and `load_model()`

For easy editing of the loaded model configuration (e.g., for changing RoPE theta or scaling of Phi-3 model)

Example:

```python
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed", model_config={"rope_theta":50000.0})
response = generate(model, tokenizer, prompt, max_tokens=MAX_TOKENS)
```

* Possible bug (default_loss)

* Revert "Possible bug (default_loss)"

This reverts commit 70a55ace18.

* Fix default_loss for lora

* 1. move load_model's new optional `model_config` arg to the end (fetch_from_hub()'s `model = load_model(model_path, lazy)`) 2. fix indentations (`black` hook)
2024-05-10 10:13:34 -07:00
Awni Hannun
ee60e2a9d5
Kv cache (#643)
* in place kv_cache

* fix

* fix kv cache size

* partially fix kv cache dtype

* step kv cache

* multiple of step size

* more teests + kv cache

* more kv cache

* udpate all models to use kv cache
2024-05-08 08:18:13 -07:00
Awni Hannun
2bf11c4633
Use stable url for MNIST (#749)
* use stable url

* remove deprecated flag
2024-05-03 17:13:05 -07:00
madroid
5079af62db
Update model card describe (#654)
* Update model card describe

- Add full link jump
- Add the address of the model uploader's Hugging Face homepage

* Add user_info to reduce whoami calls

* Remove the -U argument

* remove HF user info

* run pre-commit
2024-05-02 21:22:04 -07:00
Karim Elmaaroufi
4bf2eb17f2
Validate server params & fix logit bias bug (#731)
* Bug fix in logit bias

* Add parameter validations

* Fix typo

* Update docstrings to match MLX styling

* Black style + fix a validation bug
2024-04-30 07:27:40 -07:00
Javier de la Rosa
510d2bde49
Force multi_commits when uploading to HF (#729) 2024-04-28 19:07:17 -07:00
Awni Hannun
685012c2ad
Couple fixes for LoRA (#711)
* don't overwrite in test only mode

* only load model specific safetensors
2024-04-25 14:16:13 -07:00
Karim Elmaaroufi
1484598de1
Add support for logit bias (#697) 2024-04-21 06:53:56 -07:00
Awni Hannun
2146bcd7ee
Quantize embedding / Update quantize API (#680)
* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test
2024-04-18 18:16:10 -07:00
Anchen
f5f189e48a
fix(mlx-lm): broken server.py (#690)
* fix server.py

* fix var referenced before assignment

* add test

* clean up
2024-04-18 14:26:18 -07:00
Awni Hannun
9c5554d8ee
Use async eval (#670)
* Use async eval

* bump

* bump

* remove workaround for bfloat cumsum
2024-04-11 13:18:23 -07:00
da-z
5a4cad34ef
Always resume downloads (#674)
* Always resume downloads

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-11 06:52:32 -07:00
Angelos Katharopoulos
1278994b56
Add streaming detokenizers (#651) 2024-04-08 22:36:01 -07:00