Gökdeniz Gülmez
0989c073b0
Optimizations for mamba1 ( #1213 )
...
* added mx.einsum() operations: before: 41.293 tokens-per-sec, after: 57.822 tokens-per-sec
* Fused Operations in delta, B, C = ... :. Before: 57.822 tokens-per-sec, after: 83.890 tokens-per-sec
* Pre-computing A_log. After: 83.890 tokens-per-sec, before: 85.848 tokens-per-sec
* Update MambaBlock, Batched Input Processing, Improved Cache Handling, Pre-computed Constants, Cleaner State Management, Explicit Return Values:. Before: 82.442 tokens-per-sec, after: 129.130 tokens-per-sec.
* cleaning up and adding apple copyright to helium modelfile
* update Copyright to this year
* nits + even faster
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2025-02-03 13:36:08 -08:00
Awni Hannun
d9924d08d1
Fix no validation in lora ( #1241 )
2025-02-03 09:55:24 -08:00
Awni Hannun
9c2ef38d4d
only download local shard ( #1240 )
2025-02-02 13:58:44 -08:00
Goekdeniz-Guelmez
fbb51f651a
small fix
2025-02-01 16:08:52 +01:00
Goekdeniz-Guelmez
a03d434bb9
clean up
2025-01-31 21:37:15 +01:00
Goekdeniz-Guelmez
5998272ec2
cleaning up some namings
2025-01-31 21:27:59 +01:00
Goekdeniz-Guelmez
b379359385
small fix
2025-01-31 17:19:55 +01:00
Goekdeniz-Guelmez
b31d9cbb65
removing is-reference-free argument
2025-01-31 00:01:49 +01:00
Gökdeniz Gülmez
b3d6fc38cd
Merge branch 'ml-explore:main' into adding-dpo-training
2025-01-29 15:07:37 +01:00
Awni Hannun
e8afb59de4
better overflow correction ( #1229 )
2025-01-28 14:37:30 -08:00
Anchen
7a83077cd7
chore(mlx-lm): support text type content in messages ( #1225 )
...
* chore(mlx-lm): support text type content
* chore: optimize the messagef content processing
* nits + format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-27 17:13:50 -08:00
Awni Hannun
f44a52e2dc
batched min p and fix spec gen sampling ( #1222 )
2025-01-27 15:40:31 -08:00
Gökdeniz Gülmez
9e5482ee74
Merge branch 'main' into adding-dpo-training
2025-01-26 17:01:37 +01:00
Gökdeniz Gülmez
77faa14ba4
adding support for kyutai's helium ( #1208 )
...
* initial commit
* adding helium into training
* Update ACKNOWLEDGMENTS.md
* nits
* nits
* fixes / nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-26 07:19:07 -08:00
Goekdeniz-Guelmez
557649d8da
removing tokenizer and updates
2025-01-26 15:25:27 +01:00
Goekdeniz-Guelmez
4d0e52f7c8
more metrics
2025-01-26 15:09:55 +01:00
Goekdeniz-Guelmez
0ff1289bd9
updates
2025-01-25 22:03:32 +01:00
Goekdeniz-Guelmez
86b315fdf9
nits and quality of life improvements
2025-01-24 22:40:27 +01:00
Goekdeniz-Guelmez
531c3345c6
nits
2025-01-24 18:13:05 +01:00
Goekdeniz-Guelmez
54fcd8ed63
update DPODataset and added in system field too
2025-01-24 18:11:56 +01:00
Goekdeniz-Guelmez
aefe4ba160
nits
2025-01-22 21:36:56 +01:00
Goekdeniz-Guelmez
e1d549bcd3
nits
2025-01-22 21:03:21 +01:00
Goekdeniz-Guelmez
b0ece88909
nits
2025-01-22 20:54:31 +01:00
Gökdeniz Gülmez
69a8f11f7b
Merge branch 'ml-explore:main' into adding-dpo-training
2025-01-22 14:18:24 +01:00
Awni Hannun
9a3ddc3e65
some fixes for pipeline parallel deep seek r1 ( #1216 )
2025-01-21 19:40:29 -08:00
Victor Nogueira
df1406735b
Fix dataset variable name, in datasets.py
( #1212 )
2025-01-21 14:12:43 -08:00
Goekdeniz-Guelmez
477000ec9d
removing unneeded functions
2025-01-19 01:13:17 +01:00
Goekdeniz-Guelmez
06a9f5d106
update lora_config.yaml
2025-01-19 00:53:41 +01:00
Goekdeniz-Guelmez
1b4e19675d
update LORA.md
2025-01-19 00:48:45 +01:00
Goekdeniz-Guelmez
582f979dfd
fixing reference model loading and freezing
2025-01-19 00:41:27 +01:00
Goekdeniz-Guelmez
1ff788821c
initial commit
2025-01-19 00:19:36 +01:00
Jarrett
07f88f8057
fix(lora): add back store_true default args ( #1205 )
2025-01-16 11:15:42 -08:00
Awni Hannun
50f0a7f6d9
add internlm3 ( #1206 )
2025-01-15 14:55:41 -08:00
Ivan Fioravanti
6ae6c72c2e
reduction moved to CPU in case of distributed training ( #1200 )
2025-01-14 17:20:42 -08:00
Awni Hannun
c117af83b8
fix gpt bigcode ( #1204 )
2025-01-13 10:22:32 -08:00
Chime Ogbuji
0228c46434
Custom local dataset features ( #1085 )
...
* Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats.
* Persist configured prompt/completion key
* rebase + nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-13 10:01:18 -08:00
Prince Canuma
bf2da36fc6
Fix Cohere2: mask shape error (long context) ( #1202 )
...
* fix mask shape error (long context)
* Update llms/mlx_lm/models/cohere2.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* revert layer_idx
* black formatting
* Update cohere2.py
* format
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-12 12:58:08 -08:00
Xingjun.Wang
514502da22
Support snapshot_download for ModelScope ( #1194 )
...
* add MLX_USE_MODELSCOPE env
* update
* update snapshot_download
* update
* remove modelscope dependency and add import check
* update
* nits
* fix
---------
Co-authored-by: wangxingjun778 <jason@U-C7X6TX5G-2239.local>
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-10 15:29:34 -08:00
Awni Hannun
93c5cfd781
Add a speculative decoding generator ( #1155 )
...
* add a speculative decoding generator
* fix
* fixes
* optional kwarg pop
2025-01-10 15:27:08 -08:00
Awni Hannun
5cae0a60e6
deepseek v3 model with pipeline parallelism ( #1191 )
...
* deepseekv3
* use upload_large_file instead of deprecated multi comit
* add pipeline generation and example
* comment
* get fp16 working
* use mlx==0.22
2025-01-09 15:55:53 -08:00
Jarrett
40b88eff48
fix(lora): config yaml & arg default merge bug ( #1196 )
2025-01-09 11:33:54 -08:00
Pedro Cuenca
b8f0cacfa8
Use upload_large_folder ( #1193 )
2025-01-07 09:18:31 -08:00
Awni Hannun
9183fe8b6d
fix ( #1192 )
2025-01-06 10:12:07 -08:00
Chime Ogbuji
f2619f507c
Add support for fewshot and apply chat template lm_eval functionality ( #1180 )
...
* Add support for multiturn fewshot examples and chat templates
Added two new arguments to the evaluation script: `--fewshot-as-multiturn` and `--apply-chat-template` which correspond to lm_eval options of similar names and are very often used to ensure apples-to-apples comparisons of lm_evaluation results
* Add HF overrides for methods needed by added options
* don't add duplicate bos
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-06 07:58:43 -08:00
Angelos Katharopoulos
25ec2d8c44
Change the eos-token argument for mlx_lm.generate ( #1176 )
2025-01-05 22:26:05 -08:00
Awni Hannun
c4833a2f55
fix encoding with special tokens + chat template ( #1189 )
2025-01-03 10:50:59 -08:00
Ivan Fioravanti
3a58c36109
Improvements to mlx_lm.manage ( #1178 )
...
* improvements to manage. Default value is N and size added to deletion confirmation.
* Fixing case for no case
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-01 07:25:57 -08:00
Alex Barron
d4ef909d4a
Length masking for batch inputs ( #1173 )
...
* length masking
* add mask to mlx_lm model interface
* remove lengths
* fix test:
* comment + fix
2024-12-18 19:43:52 -08:00
Awni Hannun
db109184b7
Fix no template prompt + top_k sampling ( #1166 )
...
* fix no template prompt
* add top_k sampling
* fix chinese
2024-12-18 18:46:50 -08:00
Billel Mokeddem
845efddc8c
Fix decoding manually added tokens ( #1164 )
...
* Fix decoding manually added tokens
* fix + test
* nit
* nit
* no lag bpe
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-17 09:54:29 -08:00
Prince Canuma
dfa4dd6c93
Add support for cohere2 ( #1157 )
...
* add support for cohere2
* revert to act_fn to silu
* fix tests and sliding window attention
* add tests
* add to tuner
* fix sliding window
* add coauthor :)
Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>
* Add rotating kvcache to save space
* some nits
* style
* nits
---------
Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>
Co-authored-by: N8 <n8@n8programs.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-16 08:01:03 -08:00
Ikko Eltociear Ashimine
fc0674d2d8
chore: update evaluate.py ( #1159 )
...
occurence -> occurrence
2024-12-15 06:06:29 -08:00
Awni Hannun
9f2ea5892e
Bpe stream without space ( #1154 )
...
* bpe streaming detokenization without space
* version bump
2024-12-12 13:13:50 -08:00
Awni Hannun
2ba0e36683
[mlx-lm] Use top p in server ( #1144 )
...
* use top p in server
* couple other fixes
2024-12-12 11:12:21 -08:00
Angelos Katharopoulos
19abf3dcaa
Replace unicode errors instead of raising exception ( #1146 )
2024-12-12 11:10:41 -08:00
madroid
06af3c9b0e
Add finish_reason in GenerationResponse ( #1153 )
2024-12-12 10:37:40 -08:00
Awni Hannun
77b42b7c8b
fix llava ( #1149 )
2024-12-12 10:37:26 -08:00
Alex Barron
135c5818c1
Fix max_tokens ( #1148 )
2024-12-10 11:26:04 -08:00
madroid
12083c4b7e
Support for multiple EOS tokens ( #1141 )
...
* Support for multiple EOS tokens
* Change _eos_token_ids type from list to set
* Remove model_config & add eos_token_id
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 08:53:58 -08:00
n8programs
5687d5b99b
Adds EXAONE architecture. ( #1145 )
...
* Adds EXAONE architecture.
* nits + format
* format
* clean up and fix rope
* clean up and fix rope
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 07:58:25 -08:00
Alex Barron
2211b27388
Mixed Quantizations ( #1132 )
...
* saving/loading mixed quantizations
* comment
* add bits per weight
* more concise bpw
* count bias too
2024-12-08 14:21:50 -08:00
Alex Barron
cd8cf28c39
mlx_lm.evaluate
(#1140 )
...
* Add evaluation script
* only write top level results
* add lm eval version
* typo
* create output dir
* relative import
* comment
---------
Co-authored-by: David Grangier <dgrangier@users.noreply.github.com>
2024-12-08 12:20:10 -08:00
Awni Hannun
1963df8565
Allow prompt callback to generate_step
( #1133 )
...
* allow prompt callback and use in cache_prompt
* nit
* comments
* bump version
2024-12-03 16:17:14 -08:00
Awni Hannun
8801beb66f
Add olmo2 ( #1128 )
...
* add olmo2
* add olmo2
2024-12-02 11:42:58 -08:00
Neil Mehta
cefe793ae0
Accept mx.array type for prompt argument for stream_generate ( #1125 )
...
* Accept mx.array type for prompt argument for stream_generate
* Fix formatting
2024-11-26 16:51:55 -08:00
Awni Hannun
cfc29c29f4
Put prompt processing in same stream ( #1122 )
...
* put prompt processing in same stream
* patch
2024-11-25 09:47:00 -08:00
madroid
a5e173802e
docs: update stream_generate return type annotation ( #1121 )
...
Improve documentation clarity by:
1. Fix return type annotation to correctly reflect GenerationResponse
2. Simplify docstring by referencing GenerationResponse class
3. Remove redundant field descriptions
2024-11-25 08:10:14 -08:00
Kevin Conner
0ffdb6dd20
Fix object property value in mlx_lm.server chat completions response to match OpenAI spec ( #1119 )
...
These were "chat.completions" and "chat.completions.chunk"
but should be "chat.completion" and "chat.completion.chunk"
for compatibility with clients expecting an OpenAI API.
In particular, this solves a problem in which aider 0.64.1 reports
hitting a token limit on any completion request, no matter how small,
despite apparently correct counts in the usage property.
Refer to:
https://platform.openai.com/docs/api-reference/chat/object
> object string
> The object type, which is always chat.completion.
https://platform.openai.com/docs/api-reference/chat/streaming
> object string
> The object type, which is always chat.completion.chunk.
2024-11-24 16:37:37 -08:00
Awni Hannun
0f135396ae
Generation refactor: part 2 ( #1099 )
...
* unify with stream_generate
* fixes
* nit
* some cleanup, warnings, tests
* fix test + faster min p + test
* version
2024-11-23 11:47:06 -08:00
Awni Hannun
004eb4cc9d
Tencent HunYuan MOE model ( #1100 )
...
* hunyuan
* fix
* format str
* default trust remote code for tokenizer, allow system prompt to be configurable
2024-11-23 11:06:26 -08:00
Alban Lecocq
bd6d910ca3
[MLX LM] Fix f-string formatting in memory warning message ( #1105 )
...
* Fix missing f-prefix for string interpolation in model size warning
* Ensures proper display of memory values in MB for model and max size
2024-11-13 06:14:03 -08:00
Awni Hannun
657b4cc0aa
[MLX LM] Sampler refactor + a few improvements ( #1094 )
...
* starting
* refactor sampler/processor and a few improvements
* fix stream
* fix stream generate
* fix eos handling in stream generate
2024-11-07 16:15:24 -08:00
Angelos Katharopoulos
ed9e81dd58
Fix rotating kv cache size ( #1093 )
2024-11-05 10:24:24 -08:00
Awni Hannun
6fd1f70f73
fix spm decoder multi-byte ( #1092 )
2024-11-05 06:06:26 -08:00
ilyasch2
3b526f0aa1
Add support for falcon-mamba ( #1074 )
...
* Add support for falcon-mamba
* nits
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-11-04 12:23:30 -08:00
Anchen
82e3338987
chore(mlx-lm): add max token arg for mlx_lm.chat ( #1089 )
...
* chore(mlx-lm): add max token arg for mlx_lm.chat
* chore: update the default max token value
2024-11-04 06:06:34 -08:00
Angelos Katharopoulos
331148d8ec
Enable distributed LoRA training ( #821 )
2024-11-02 18:02:31 -07:00
Awni Hannun
0f799947d0
fix ( #1079 )
2024-11-01 16:30:32 -07:00
Awni Hannun
e510987870
Clear cache every now and then ( #1081 )
...
* clear cache every now and then
* don't need user arg anymore
2024-11-01 14:15:32 -07:00
Alex Barron
85ffd2c96a
Quantized KV Cache ( #1075 )
...
* add QuantizedKVCache
* simplify
* add tests
* single sdpa function
* fix sed
* in place
* fix tests
* support different k and v head dims
2024-10-31 16:59:52 -07:00
Awni Hannun
9f34fdbda4
Wire models in MLX LM ( #1069 )
...
* wired in MLX LM
* fix synch
* comment + nit
* version
* mlx lm version
* bump to 0.19.2
2024-10-31 08:17:14 -07:00
Awni Hannun
8fe9539af7
Fix detokenizer space match for quote ( #1072 )
...
* fix + test
* remove transformer flax/torch warning
* format
2024-10-27 15:06:07 -07:00
hschaeufler
ab4bf05c6e
Update lora_config.yaml with new param: num_layers ( #1068 )
2024-10-26 09:34:46 -07:00
Awni Hannun
9000e280ae
fix mamba models conversion ( #1065 )
2024-10-22 15:44:08 -07:00
madroid
d1d480867b
LoRA: update tools datasets docs ( #1063 )
...
* LoRA: update tools datasets docs
* nits
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-22 12:19:11 -07:00
Awni Hannun
66e7bcb886
override dtype with quant ( #1062 )
2024-10-22 09:56:45 -07:00
aronson
743763bc2e
Handle empty string case in maybe_trim_space ( #1055 )
...
* Handle empty string case in maybe_trim_space
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-20 20:46:43 -07:00
Awni Hannun
605c4854f1
Prompt caching in mlx_lm.server
( #1026 )
...
* caching in server
* nits
* fix tests
* don't throw if no metal
* comments
2024-10-14 10:57:22 -07:00
Awni Hannun
8dca1a2f60
Tokenizer updates + tests ( #1024 )
...
* tokenizer updates + tests
* nit
* add can_trim_prompt_cache
* nits
2024-10-14 10:48:46 -07:00
Awni Hannun
c799133998
Make llm async eval less brittle ( #1040 )
...
* Make llm async eval less brittle
* nit
2024-10-14 10:25:24 -07:00
Shunta Saito
7612c646f3
Fix PLaMo model to support Grouped Query Attention ( #1037 )
2024-10-12 15:26:50 -07:00
Awni Hannun
4360e7ccec
clear cache during prompt processing ( #1027 )
2024-10-09 16:48:32 -07:00
Awni Hannun
b7373cb44f
fix long prompt generations ( #1023 )
2024-10-09 11:09:36 -07:00
Awni Hannun
fca087be49
More cache improvements ( #1015 )
...
* fix rotating kv cache for chat use case
* reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat
* nit in chat
* fix tests
* fix tests
* fix tests
* docs
* chat command
* comments + docs
* Define meta_state on all Cache implementations
* fixes + trim_prompt_cache api
* fix default model
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-10-07 20:45:51 -07:00
madroid
36c1d8e8dc
Server: support function calling ( #1003 )
2024-10-02 12:36:07 -07:00
nathan
0866e23a67
repetiton_penalty and logits_bias just using logits_processors ( #1004 )
...
* refactor of repetition_penalty and logits_bias to use logits_processor
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:49:03 -07:00
Zai Thottakath
418d9a5511
Feature: QDoRA ( #891 )
...
* feat: QDoRA with tests and a small bug fix for recalculation of self.m
* some simplifications and fixes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 08:01:11 -07:00
madroid
aa1c8abdc6
LoRA: Support HuggingFace dataset via data parameter ( #996 )
...
* LoRA: support huggingface dataset via `data` argument
* LoRA: Extract the load_custom_hf_dataset function
* LoRA: split small functions
* fix spelling errors
* handle load hf dataset error
* fix pre-commit lint
* update data argument help
* nits and doc
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 07:36:21 -07:00
Gökdeniz Gülmez
50e5ca81a8
Adding full finetuning ( #903 )
...
* Adding full model weights finetuning
* Updating the LORA.md and ACKNOWLEDGMENTS.md files.
* removing --use-dora and --fulll-training and adding --fine-tune-type
* some clean up
* reformating and fixing dora training
* updated CONFIG_DEFAULTS
* update config example
* update in the config example fie
* Update LORA.md
* merge and commit
* adding argument for dora linear layer
* clean up
* clean up in the example yaml file
* fix
* final fix before sending
* small addition to re md file
* fix for loading the fully trained model by saving all the files and configs correctly
* clean up
* removing the unnesesairy files
* changing lora layers back to 16
* removed max file size
* nits
* resolve merge
* some consistency changes
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-29 17:12:47 -07:00
madroid
7ec2021bb9
LoRA: support tools(function calling) format datasets ( #995 )
...
* LoRA: support fine-tuning tools datasets
* LoRA: Split small function
* LoRA: add tools format to lora docs
* LoRA: pre-commit fix
* Revert "LoRA: pre-commit fix"
This reverts commit b94b7e0fe7
.
* Revert "LoRA: Split small function"
This reverts commit 3f6a5f19fd
.
* LoRA: remove ToolsDataset
In a JSONL file, not all data is required to include the tools value.
* nit in readme
* nit in readme
* nit in readme
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:41:36 -07:00