Commit Graph

564 Commits

Author SHA1 Message Date
Kevin Conner
ec494a97ec Fix object property value in mlx_lm.server chat completions response to match OpenAI spec
These were "chat.completions" and "chat.completions.chunk"
but should be "chat.completion" and "chat.completion.chunk"
for compatibility with clients expecting an OpenAI API.

In particular, this solves a problem in which aider 0.64.1 reports
hitting a token limit on any completion request, no matter how small,
despite apparently correct counts in the usage property.

Refer to:

https://platform.openai.com/docs/api-reference/chat/object

> object string
> The object type, which is always chat.completion.

https://platform.openai.com/docs/api-reference/chat/streaming

> object string
> The object type, which is always chat.completion.chunk.
2024-11-24 15:03:57 -08:00
Awni Hannun
0f135396ae
Generation refactor: part 2 (#1099)
* unify with stream_generate

* fixes

* nit

* some cleanup, warnings, tests

* fix test + faster min p + test

* version
2024-11-23 11:47:06 -08:00
Awni Hannun
004eb4cc9d
Tencent HunYuan MOE model (#1100)
* hunyuan

* fix

* format str

* default trust remote code for tokenizer, allow system prompt to be configurable
2024-11-23 11:06:26 -08:00
Angelos Katharopoulos
042280ce50
Fix format (#1115) 2024-11-20 16:15:53 -08:00
Valentin Roussellet
60c7b80350
Pass seed to sd img2img (#1114) 2024-11-20 15:21:52 -08:00
Alban Lecocq
bd6d910ca3
[MLX LM] Fix f-string formatting in memory warning message (#1105)
* Fix missing f-prefix for string interpolation in model size warning
* Ensures proper display of memory values in MB for model and max size
2024-11-13 06:14:03 -08:00
madroid
1e07660184
FLUX: save train config (#1049) 2024-11-08 17:15:19 -08:00
Awni Hannun
657b4cc0aa
[MLX LM] Sampler refactor + a few improvements (#1094)
* starting

* refactor sampler/processor and a few improvements

* fix stream

* fix stream generate

* fix eos handling in stream generate
2024-11-07 16:15:24 -08:00
Angelos Katharopoulos
ed9e81dd58
Fix rotating kv cache size (#1093) 2024-11-05 10:24:24 -08:00
Awni Hannun
6fd1f70f73
fix spm decoder multi-byte (#1092) 2024-11-05 06:06:26 -08:00
Anthony Wu
4394633ce0
mlx_whisper: add support for audio input from stdin (#1012)
* add support for audio and input name from stdin

* refactored to stdin - arg, and output-name template

* fix bugs, add test coverage

* fix doc to match arg rename

* some nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-11-04 14:02:13 -08:00
ilyasch2
3b526f0aa1
Add support for falcon-mamba (#1074)
* Add support for falcon-mamba

* nits

* nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-11-04 12:23:30 -08:00
Anchen
82e3338987
chore(mlx-lm): add max token arg for mlx_lm.chat (#1089)
* chore(mlx-lm): add max token arg for mlx_lm.chat

* chore: update the default max token value
2024-11-04 06:06:34 -08:00
Angelos Katharopoulos
331148d8ec
Enable distributed LoRA training (#821) 2024-11-02 18:02:31 -07:00
Awni Hannun
29c954f4cb
fix (#1082) 2024-11-02 13:51:38 -07:00
Awni Hannun
0f799947d0
fix (#1079) 2024-11-01 16:30:32 -07:00
Awni Hannun
e510987870
Clear cache every now and then (#1081)
* clear cache every now and then

* don't need user arg anymore
2024-11-01 14:15:32 -07:00
Awni Hannun
8160e0c4e5
Whisper improvements (#1080)
* use safetensors in whisper

* speed up decoder

* version
2024-11-01 10:52:28 -07:00
Alex Barron
85ffd2c96a
Quantized KV Cache (#1075)
* add QuantizedKVCache

* simplify

* add tests

* single sdpa function

* fix sed

* in place

* fix tests

* support different k and v head dims
2024-10-31 16:59:52 -07:00
Awni Hannun
9f34fdbda4
Wire models in MLX LM (#1069)
* wired in MLX LM

* fix synch

* comment + nit

* version

* mlx lm version

* bump to 0.19.2
2024-10-31 08:17:14 -07:00
Awni Hannun
8fe9539af7
Fix detokenizer space match for quote (#1072)
* fix + test

* remove transformer flax/torch warning

* format
2024-10-27 15:06:07 -07:00
hschaeufler
ab4bf05c6e
Update lora_config.yaml with new param: num_layers (#1068) 2024-10-26 09:34:46 -07:00
Saurav Maheshkar
4971462bf0
feat(clip): add linear probe evaluation script (#960) 2024-10-24 21:56:17 -07:00
Awni Hannun
9000e280ae
fix mamba models conversion (#1065) 2024-10-22 15:44:08 -07:00
madroid
d1d480867b
LoRA: update tools datasets docs (#1063)
* LoRA: update tools datasets docs

* nits

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-22 12:19:11 -07:00
Awni Hannun
66e7bcb886
override dtype with quant (#1062) 2024-10-22 09:56:45 -07:00
aronson
743763bc2e
Handle empty string case in maybe_trim_space (#1055)
* Handle empty string case in maybe_trim_space

* nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-20 20:46:43 -07:00
madroid
f491d473a3
FLUX: Optimize dataset loading logic (#1038) 2024-10-15 10:37:45 -07:00
Zak B. Elep
3d62b058a4
fix: typo on flux model preloading (#1050) 2024-10-15 09:13:01 -07:00
madroid
bbd2003047
FLUX: update README.md (#1036) 2024-10-14 11:21:41 -07:00
Awni Hannun
605c4854f1
Prompt caching in mlx_lm.server (#1026)
* caching in server

* nits

* fix tests

* don't throw if no metal

* comments
2024-10-14 10:57:22 -07:00
Awni Hannun
8dca1a2f60
Tokenizer updates + tests (#1024)
* tokenizer updates + tests

* nit

* add can_trim_prompt_cache

* nits
2024-10-14 10:48:46 -07:00
Awni Hannun
6c368f2124
bump mac tests to use py39 (#1047) 2024-10-14 10:40:36 -07:00
Awni Hannun
c799133998
Make llm async eval less brittle (#1040)
* Make llm async eval less brittle

* nit
2024-10-14 10:25:24 -07:00
Seitaro Sugawara
1e0cda68c6
Update README.md (#1045)
* Update README.md

A small typo was fixed in the musicgen README.md.

* Update musicgen/README.md

---------

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-10-14 06:21:25 -07:00
Shunta Saito
7612c646f3
Fix PLaMo model to support Grouped Query Attention (#1037) 2024-10-12 15:26:50 -07:00
Ivan Fioravanti
d8611dd69f
Small typo fixed in flux README.md (#1035) 2024-10-12 06:14:01 -07:00
Angelos Katharopoulos
a5f2bab070
Add FLUX finetuning (#1028) 2024-10-11 21:17:41 -07:00
Alex Barron
d72fdeb4ee
MusicGen (#1020)
* Add MusicGen model

* add benchmarks

* change to from_pretrained

* symlinks

* add readme and requirements

* fix readme

* readme
2024-10-11 10:16:20 -07:00
Awni Hannun
4360e7ccec
clear cache during prompt processing (#1027) 2024-10-09 16:48:32 -07:00
Awni Hannun
b7373cb44f
fix long prompt generations (#1023) 2024-10-09 11:09:36 -07:00
Awni Hannun
fca087be49
More cache improvements (#1015)
* fix rotating kv cache for chat use case

* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat

* nit in chat

* fix tests

* fix tests

* fix tests

* docs

* chat command

* comments + docs

* Define meta_state on all Cache implementations

* fixes + trim_prompt_cache api

* fix default model

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00
Awni Hannun
9bc53fc210
convert (#1006) 2024-10-02 13:13:33 -07:00
madroid
36c1d8e8dc
Server: support function calling (#1003) 2024-10-02 12:36:07 -07:00
nathan
0866e23a67
repetiton_penalty and logits_bias just using logits_processors (#1004)
* refactor of repetition_penalty and logits_bias to use logits_processor

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:49:03 -07:00
Zai Thottakath
418d9a5511
Feature: QDoRA (#891)
* feat: QDoRA with tests and a small bug fix for recalculation of self.m

* some simplifications and fixes

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:01:11 -07:00
madroid
aa1c8abdc6
LoRA: Support HuggingFace dataset via data parameter (#996)
* LoRA: support huggingface dataset via `data` argument

* LoRA: Extract the load_custom_hf_dataset function

* LoRA: split small functions

* fix spelling errors

* handle load hf dataset error

* fix pre-commit lint

* update data argument help

* nits and doc

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 07:36:21 -07:00
Gökdeniz Gülmez
50e5ca81a8
Adding full finetuning (#903)
* Adding full model weights finetuning

* Updating the LORA.md and ACKNOWLEDGMENTS.md files.

* removing --use-dora and --fulll-training and adding --fine-tune-type

* some clean up

* reformating and fixing dora training

* updated CONFIG_DEFAULTS

* update config example

* update in the config example fie

* Update LORA.md

* merge and commit

* adding argument for dora linear layer

* clean up

* clean up in the example yaml file

* fix

* final fix before sending

* small addition to re md file

* fix for loading the fully trained model by saving all the files and configs correctly

* clean up

* removing the unnesesairy files

* changing lora layers back to 16

* removed max file size

* nits

* resolve merge

* some consistency changes

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-29 17:12:47 -07:00
madroid
7ec2021bb9
LoRA: support tools(function calling) format datasets (#995)
* LoRA: support fine-tuning tools datasets

* LoRA: Split small function

* LoRA: add tools format to lora docs

* LoRA: pre-commit fix

* Revert "LoRA: pre-commit fix"

This reverts commit b94b7e0fe7.

* Revert "LoRA: Split small function"

This reverts commit 3f6a5f19fd.

* LoRA: remove ToolsDataset

In a JSONL file, not all data is required to include the tools value.

* nit in readme

* nit in readme

* nit in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:41:36 -07:00
nathan
ace2bb5890
Add logits_processor option to generate_step function (#983)
* Add logits_processor option for the generation as in huggingface transformers library

* concatenation correction

* Rename the tokens variable for clarity

* remove the logit_bias argument from generate_step method

* fix the variable name

* nits + test

* test

* add back logit bias + test

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:08:49 -07:00