Prince Canuma
dfa4dd6c93
Add support for cohere2 ( #1157 )
...
* add support for cohere2
* revert to act_fn to silu
* fix tests and sliding window attention
* add tests
* add to tuner
* fix sliding window
* add coauthor :)
Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>
* Add rotating kvcache to save space
* some nits
* style
* nits
---------
Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>
Co-authored-by: N8 <n8@n8programs.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-16 08:01:03 -08:00
Ikko Eltociear Ashimine
fc0674d2d8
chore: update evaluate.py ( #1159 )
...
occurence -> occurrence
2024-12-15 06:06:29 -08:00
Goekdeniz-Guelmez
dff4e52910
adding the modelnames in the LORA.md file and removing unused functions from mamba2.py
2024-12-12 22:52:00 +01:00
Awni Hannun
9f2ea5892e
Bpe stream without space ( #1154 )
...
* bpe streaming detokenization without space
* version bump
2024-12-12 13:13:50 -08:00
Goekdeniz-Guelmez
a883e39f41
optimizing the code for faster inference but still generates giberish
2024-12-12 21:08:33 +01:00
Awni Hannun
2ba0e36683
[mlx-lm] Use top p in server ( #1144 )
...
* use top p in server
* couple other fixes
2024-12-12 11:12:21 -08:00
Angelos Katharopoulos
19abf3dcaa
Replace unicode errors instead of raising exception ( #1146 )
2024-12-12 11:10:41 -08:00
madroid
06af3c9b0e
Add finish_reason in GenerationResponse ( #1153 )
2024-12-12 10:37:40 -08:00
Awni Hannun
77b42b7c8b
fix llava ( #1149 )
2024-12-12 10:37:26 -08:00
Gökdeniz Gülmez
c1d9ec329c
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-12-10 20:43:11 +01:00
Alex Barron
135c5818c1
Fix max_tokens ( #1148 )
2024-12-10 11:26:04 -08:00
Goekdeniz-Guelmez
184d3d3267
clean up
2024-12-10 18:20:13 +01:00
Goekdeniz-Guelmez
80e88b4f4d
nits
2024-12-10 18:18:59 +01:00
Goekdeniz-Guelmez
b10afe3662
nits
2024-12-10 18:15:12 +01:00
Goekdeniz-Guelmez
9f8a6a3509
inference on codestral works but is giberish
2024-12-10 17:34:44 +01:00
Gökdeniz Gülmez
ddad2105ef
Merge branch 'main' into adding-support-for-mamba2
2024-12-10 14:32:44 +01:00
madroid
12083c4b7e
Support for multiple EOS tokens ( #1141 )
...
* Support for multiple EOS tokens
* Change _eos_token_ids type from list to set
* Remove model_config & add eos_token_id
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 08:53:58 -08:00
n8programs
5687d5b99b
Adds EXAONE architecture. ( #1145 )
...
* Adds EXAONE architecture.
* nits + format
* format
* clean up and fix rope
* clean up and fix rope
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 07:58:25 -08:00
Alex Barron
2211b27388
Mixed Quantizations ( #1132 )
...
* saving/loading mixed quantizations
* comment
* add bits per weight
* more concise bpw
* count bias too
2024-12-08 14:21:50 -08:00
Alex Barron
cd8cf28c39
mlx_lm.evaluate
(#1140 )
...
* Add evaluation script
* only write top level results
* add lm eval version
* typo
* create output dir
* relative import
* comment
---------
Co-authored-by: David Grangier <dgrangier@users.noreply.github.com>
2024-12-08 12:20:10 -08:00
Awni Hannun
1963df8565
Allow prompt callback to generate_step
( #1133 )
...
* allow prompt callback and use in cache_prompt
* nit
* comments
* bump version
2024-12-03 16:17:14 -08:00
Awni Hannun
8801beb66f
Add olmo2 ( #1128 )
...
* add olmo2
* add olmo2
2024-12-02 11:42:58 -08:00
Neil Mehta
cefe793ae0
Accept mx.array type for prompt argument for stream_generate ( #1125 )
...
* Accept mx.array type for prompt argument for stream_generate
* Fix formatting
2024-11-26 16:51:55 -08:00
Awni Hannun
cfc29c29f4
Put prompt processing in same stream ( #1122 )
...
* put prompt processing in same stream
* patch
2024-11-25 09:47:00 -08:00
madroid
a5e173802e
docs: update stream_generate return type annotation ( #1121 )
...
Improve documentation clarity by:
1. Fix return type annotation to correctly reflect GenerationResponse
2. Simplify docstring by referencing GenerationResponse class
3. Remove redundant field descriptions
2024-11-25 08:10:14 -08:00
Kevin Conner
0ffdb6dd20
Fix object property value in mlx_lm.server chat completions response to match OpenAI spec ( #1119 )
...
These were "chat.completions" and "chat.completions.chunk"
but should be "chat.completion" and "chat.completion.chunk"
for compatibility with clients expecting an OpenAI API.
In particular, this solves a problem in which aider 0.64.1 reports
hitting a token limit on any completion request, no matter how small,
despite apparently correct counts in the usage property.
Refer to:
https://platform.openai.com/docs/api-reference/chat/object
> object string
> The object type, which is always chat.completion.
https://platform.openai.com/docs/api-reference/chat/streaming
> object string
> The object type, which is always chat.completion.chunk.
2024-11-24 16:37:37 -08:00
Goekdeniz-Guelmez
38e5801edb
loading codestral works but no tinference
2024-11-24 16:26:45 +01:00
Awni Hannun
0f135396ae
Generation refactor: part 2 ( #1099 )
...
* unify with stream_generate
* fixes
* nit
* some cleanup, warnings, tests
* fix test + faster min p + test
* version
2024-11-23 11:47:06 -08:00
Awni Hannun
004eb4cc9d
Tencent HunYuan MOE model ( #1100 )
...
* hunyuan
* fix
* format str
* default trust remote code for tokenizer, allow system prompt to be configurable
2024-11-23 11:06:26 -08:00
Goekdeniz-Guelmez
a6ddc27a4e
removing last checkpoint file
2024-11-21 22:33:56 +01:00
Goekdeniz-Guelmez
57b1717cf5
inference fixed
2024-11-21 22:25:58 +01:00
Goekdeniz-Guelmez
117ffd3909
removing some files
2024-11-21 22:05:42 +01:00
Goekdeniz-Guelmez
e22b2dbf27
Fixed streaming generation and got rid of generating gibberish, but is still a litle slow: 0.222 tokens-per-sec
2024-11-21 22:01:28 +01:00
Gökdeniz Gülmez
e4eae973e8
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-11-21 21:06:45 +01:00
Alban Lecocq
bd6d910ca3
[MLX LM] Fix f-string formatting in memory warning message ( #1105 )
...
* Fix missing f-prefix for string interpolation in model size warning
* Ensures proper display of memory values in MB for model and max size
2024-11-13 06:14:03 -08:00
Goekdeniz-Guelmez
1d851069ea
nits
2024-11-10 17:21:18 +01:00
Goekdeniz-Guelmez
1a6688384d
imopemented multi Token inputs, but still generating Gibberish
2024-11-10 17:19:00 +01:00
Goekdeniz-Guelmez
2f95b361a8
removed the custom Mamba2Cache adn updated the existing MambaCache but still only one input Token and outputs gibberish
2024-11-10 16:57:03 +01:00
Gökdeniz Gülmez
49d3f188f8
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-11-10 16:36:02 +01:00
Goekdeniz-Guelmez
3a499f9735
fixed inference slowness but it cant handle multible Token inputs and is generateing gibberish
2024-11-10 16:35:07 +01:00
Goekdeniz-Guelmez
800b60239c
save checkpoint
2024-11-10 14:36:26 +01:00
Awni Hannun
657b4cc0aa
[MLX LM] Sampler refactor + a few improvements ( #1094 )
...
* starting
* refactor sampler/processor and a few improvements
* fix stream
* fix stream generate
* fix eos handling in stream generate
2024-11-07 16:15:24 -08:00
Goekdeniz-Guelmez
906f972d36
save push
2024-11-06 16:35:46 +01:00
Angelos Katharopoulos
ed9e81dd58
Fix rotating kv cache size ( #1093 )
2024-11-05 10:24:24 -08:00
Awni Hannun
6fd1f70f73
fix spm decoder multi-byte ( #1092 )
2024-11-05 06:06:26 -08:00
ilyasch2
3b526f0aa1
Add support for falcon-mamba ( #1074 )
...
* Add support for falcon-mamba
* nits
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-11-04 12:23:30 -08:00
Anchen
82e3338987
chore(mlx-lm): add max token arg for mlx_lm.chat ( #1089 )
...
* chore(mlx-lm): add max token arg for mlx_lm.chat
* chore: update the default max token value
2024-11-04 06:06:34 -08:00
Angelos Katharopoulos
331148d8ec
Enable distributed LoRA training ( #821 )
2024-11-02 18:02:31 -07:00
Awni Hannun
0f799947d0
fix ( #1079 )
2024-11-01 16:30:32 -07:00
Awni Hannun
e510987870
Clear cache every now and then ( #1081 )
...
* clear cache every now and then
* don't need user arg anymore
2024-11-01 14:15:32 -07:00
Alex Barron
85ffd2c96a
Quantized KV Cache ( #1075 )
...
* add QuantizedKVCache
* simplify
* add tests
* single sdpa function
* fix sed
* in place
* fix tests
* support different k and v head dims
2024-10-31 16:59:52 -07:00
Awni Hannun
9f34fdbda4
Wire models in MLX LM ( #1069 )
...
* wired in MLX LM
* fix synch
* comment + nit
* version
* mlx lm version
* bump to 0.19.2
2024-10-31 08:17:14 -07:00
Goekdeniz-Guelmez
58b448dc0b
updates
2024-10-30 21:23:13 +01:00
Gökdeniz Gülmez
ffc7ab06a0
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-10-30 17:04:38 +01:00
Awni Hannun
8fe9539af7
Fix detokenizer space match for quote ( #1072 )
...
* fix + test
* remove transformer flax/torch warning
* format
2024-10-27 15:06:07 -07:00
hschaeufler
ab4bf05c6e
Update lora_config.yaml with new param: num_layers ( #1068 )
2024-10-26 09:34:46 -07:00
Gökdeniz Gülmez
3b70708201
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-10-25 08:57:37 +02:00
Goekdeniz-Guelmez
7c8849e795
update
2024-10-24 16:16:42 +02:00
Awni Hannun
9000e280ae
fix mamba models conversion ( #1065 )
2024-10-22 15:44:08 -07:00
Goekdeniz-Guelmez
a677638c4b
inference works but is hella slow
2024-10-22 23:06:06 +02:00
Goekdeniz-Guelmez
9ab581d678
notes
2024-10-22 22:10:53 +02:00
Goekdeniz-Guelmez
e43a2ab229
not working, incorrect handling with cache probably
2024-10-22 22:04:25 +02:00
Goekdeniz-Guelmez
55485b98e8
update
2024-10-22 21:23:47 +02:00
madroid
d1d480867b
LoRA: update tools datasets docs ( #1063 )
...
* LoRA: update tools datasets docs
* nits
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-22 12:19:11 -07:00
Goekdeniz-Guelmez
758597eaa8
adding multi token input and correct cache handling in ssm step
2024-10-22 20:44:23 +02:00
Awni Hannun
66e7bcb886
override dtype with quant ( #1062 )
2024-10-22 09:56:45 -07:00
Goekdeniz-Guelmez
5326d9373a
Merge branch 'adding-support-for-mamba2' of https://github.com/Goekdeniz-Guelmez/mlx-examples into adding-support-for-mamba2
2024-10-22 18:26:05 +02:00
Goekdeniz-Guelmez
b9c57cd429
generation works! trying training now
2024-10-22 18:25:59 +02:00
Gökdeniz Gülmez
0ef73f3a2d
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-10-21 15:14:19 +02:00
aronson
743763bc2e
Handle empty string case in maybe_trim_space ( #1055 )
...
* Handle empty string case in maybe_trim_space
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-20 20:46:43 -07:00
Goekdeniz-Guelmez
c1634ce81b
still generating gibberish
2024-10-20 18:41:28 +02:00
Goekdeniz-Guelmez
ab4cf1d1cf
generation works but outputs gibberish
2024-10-20 18:04:34 +02:00
Goekdeniz-Guelmez
4ab5139c05
quick save
2024-10-20 16:11:39 +02:00
Goekdeniz-Guelmez
cd036ccfb5
fix generation works too (almost)
2024-10-16 21:13:36 +02:00
Goekdeniz-Guelmez
181d6abedc
Merge branch 'adding-support-for-mamba2' of https://github.com/Goekdeniz-Guelmez/mlx-examples into adding-support-for-mamba2
2024-10-16 21:09:42 +02:00
Goekdeniz-Guelmez
8073cb486c
adding debug statements (somehiw generating only goes through the fist MambaMixer block pass)
2024-10-16 21:09:30 +02:00
Gökdeniz Gülmez
855fcc4327
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-10-16 18:57:55 +02:00
Awni Hannun
605c4854f1
Prompt caching in mlx_lm.server
( #1026 )
...
* caching in server
* nits
* fix tests
* don't throw if no metal
* comments
2024-10-14 10:57:22 -07:00
Awni Hannun
8dca1a2f60
Tokenizer updates + tests ( #1024 )
...
* tokenizer updates + tests
* nit
* add can_trim_prompt_cache
* nits
2024-10-14 10:48:46 -07:00
Awni Hannun
c799133998
Make llm async eval less brittle ( #1040 )
...
* Make llm async eval less brittle
* nit
2024-10-14 10:25:24 -07:00
Gökdeniz Gülmez
3f1c1dde6a
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-10-14 16:32:00 +02:00
Shunta Saito
7612c646f3
Fix PLaMo model to support Grouped Query Attention ( #1037 )
2024-10-12 15:26:50 -07:00
Goekdeniz-Guelmez
00ba27fe6c
adding debug statements
2024-10-11 21:36:41 +02:00
Goekdeniz-Guelmez
6f88dd59d7
quick clean up and fix
2024-10-11 21:08:13 +02:00
Goekdeniz-Guelmez
9c075a71f8
Merge branch 'adding-support-for-mamba2' of https://github.com/Goekdeniz-Guelmez/mlx-examples into adding-support-for-mamba2
2024-10-11 20:54:35 +02:00
Goekdeniz-Guelmez
4e1236cbf6
fixing loading the model
2024-10-11 20:53:29 +02:00
Awni Hannun
4360e7ccec
clear cache during prompt processing ( #1027 )
2024-10-09 16:48:32 -07:00
Awni Hannun
b7373cb44f
fix long prompt generations ( #1023 )
2024-10-09 11:09:36 -07:00
Awni Hannun
fca087be49
More cache improvements ( #1015 )
...
* fix rotating kv cache for chat use case
* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat
* nit in chat
* fix tests
* fix tests
* fix tests
* docs
* chat command
* comments + docs
* Define meta_state on all Cache implementations
* fixes + trim_prompt_cache api
* fix default model
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00
Gökdeniz Gülmez
52d6ca0ad0
Merge branch 'ml-explore:main' into adding-support-for-mamba2
2024-10-04 22:25:31 +02:00
madroid
36c1d8e8dc
Server: support function calling ( #1003 )
2024-10-02 12:36:07 -07:00
Goekdeniz-Guelmez
264ba43707
update trainer/lora.py and adding DepthWiseConv1d because mlx 0.18.0 doesnt axepts groups parameter
2024-10-02 19:19:32 +02:00
Gökdeniz Gülmez
49b9fc1a4c
Create mamba2.py
2024-10-02 12:48:15 +02:00
nathan
0866e23a67
repetiton_penalty and logits_bias just using logits_processors ( #1004 )
...
* refactor of repetition_penalty and logits_bias to use logits_processor
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:49:03 -07:00
Zai Thottakath
418d9a5511
Feature: QDoRA ( #891 )
...
* feat: QDoRA with tests and a small bug fix for recalculation of self.m
* some simplifications and fixes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:01:11 -07:00
madroid
aa1c8abdc6
LoRA: Support HuggingFace dataset via data parameter ( #996 )
...
* LoRA: support huggingface dataset via `data` argument
* LoRA: Extract the load_custom_hf_dataset function
* LoRA: split small functions
* fix spelling errors
* handle load hf dataset error
* fix pre-commit lint
* update data argument help
* nits and doc
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 07:36:21 -07:00
Gökdeniz Gülmez
50e5ca81a8
Adding full finetuning ( #903 )
...
* Adding full model weights finetuning
* Updating the LORA.md and ACKNOWLEDGMENTS.md files.
* removing --use-dora and --fulll-training and adding --fine-tune-type
* some clean up
* reformating and fixing dora training
* updated CONFIG_DEFAULTS
* update config example
* update in the config example fie
* Update LORA.md
* merge and commit
* adding argument for dora linear layer
* clean up
* clean up in the example yaml file
* fix
* final fix before sending
* small addition to re md file
* fix for loading the fully trained model by saving all the files and configs correctly
* clean up
* removing the unnesesairy files
* changing lora layers back to 16
* removed max file size
* nits
* resolve merge
* some consistency changes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-29 17:12:47 -07:00
madroid
7ec2021bb9
LoRA: support tools(function calling) format datasets ( #995 )
...
* LoRA: support fine-tuning tools datasets
* LoRA: Split small function
* LoRA: add tools format to lora docs
* LoRA: pre-commit fix
* Revert "LoRA: pre-commit fix"
This reverts commit b94b7e0fe7
.
* Revert "LoRA: Split small function"
This reverts commit 3f6a5f19fd
.
* LoRA: remove ToolsDataset
In a JSONL file, not all data is required to include the tools value.
* nit in readme
* nit in readme
* nit in readme
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:41:36 -07:00
nathan
ace2bb5890
Add logits_processor option to generate_step function ( #983 )
...
* Add logits_processor option for the generation as in huggingface transformers library
* concatenation correction
* Rename the tokens variable for clarity
* remove the logit_bias argument from generate_step method
* fix the variable name
* nits + test
* test
* add back logit bias + test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:08:49 -07:00
jamesm131
d812516d3d
Add /v1/models endpoint to mlx_lm.server ( #984 )
...
* Add 'models' endpoint to server
* Add test for new 'models' server endpoint
* Check hf_cache for mlx models
* update tests to check hf_cache for models
* simplify test
* doc
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 07:21:11 -07:00