Angelos Katharopoulos
b9eff0d744
Improve printing for FLUX distributed training
2025-01-13 22:47:54 -08:00
Awni Hannun
c117af83b8
fix gpt bigcode ( #1204 )
2025-01-13 10:22:32 -08:00
Chime Ogbuji
0228c46434
Custom local dataset features ( #1085 )
...
* Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats.
* Persist configured prompt/completion key
* rebase + nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-13 10:01:18 -08:00
Prince Canuma
bf2da36fc6
Fix Cohere2: mask shape error (long context) ( #1202 )
...
* fix mask shape error (long context)
* Update llms/mlx_lm/models/cohere2.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* revert layer_idx
* black formatting
* Update cohere2.py
* format
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-12 12:58:08 -08:00
Xingjun.Wang
514502da22
Support snapshot_download for ModelScope ( #1194 )
...
* add MLX_USE_MODELSCOPE env
* update
* update snapshot_download
* update
* remove modelscope dependency and add import check
* update
* nits
* fix
---------
Co-authored-by: wangxingjun778 <jason@U-C7X6TX5G-2239.local>
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-10 15:29:34 -08:00
Awni Hannun
93c5cfd781
Add a speculative decoding generator ( #1155 )
...
* add a speculative decoding generator
* fix
* fixes
* optional kwarg pop
2025-01-10 15:27:08 -08:00
Awni Hannun
5cae0a60e6
deepseek v3 model with pipeline parallelism ( #1191 )
...
* deepseekv3
* use upload_large_file instead of deprecated multi comit
* add pipeline generation and example
* comment
* get fp16 working
* use mlx==0.22
2025-01-09 15:55:53 -08:00
Jarrett
40b88eff48
fix(lora): config yaml & arg default merge bug ( #1196 )
2025-01-09 11:33:54 -08:00
Pedro Cuenca
b8f0cacfa8
Use upload_large_folder ( #1193 )
2025-01-07 09:18:31 -08:00
Awni Hannun
9183fe8b6d
fix ( #1192 )
2025-01-06 10:12:07 -08:00
Chime Ogbuji
f2619f507c
Add support for fewshot and apply chat template lm_eval functionality ( #1180 )
...
* Add support for multiturn fewshot examples and chat templates
Added two new arguments to the evaluation script: `--fewshot-as-multiturn` and `--apply-chat-template` which correspond to lm_eval options of similar names and are very often used to ensure apples-to-apples comparisons of lm_evaluation results
* Add HF overrides for methods needed by added options
* don't add duplicate bos
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-06 07:58:43 -08:00
Angelos Katharopoulos
25ec2d8c44
Change the eos-token argument for mlx_lm.generate ( #1176 )
2025-01-05 22:26:05 -08:00
Awni Hannun
c4833a2f55
fix encoding with special tokens + chat template ( #1189 )
2025-01-03 10:50:59 -08:00
Ivan Fioravanti
3a58c36109
Improvements to mlx_lm.manage ( #1178 )
...
* improvements to manage. Default value is N and size added to deletion confirmation.
* Fixing case for no case
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-01 07:25:57 -08:00
Alex Barron
d4ef909d4a
Length masking for batch inputs ( #1173 )
...
* length masking
* add mask to mlx_lm model interface
* remove lengths
* fix test:
* comment + fix
2024-12-18 19:43:52 -08:00
Awni Hannun
db109184b7
Fix no template prompt + top_k sampling ( #1166 )
...
* fix no template prompt
* add top_k sampling
* fix chinese
2024-12-18 18:46:50 -08:00
Billel Mokeddem
845efddc8c
Fix decoding manually added tokens ( #1164 )
...
* Fix decoding manually added tokens
* fix + test
* nit
* nit
* no lag bpe
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-17 09:54:29 -08:00
Prince Canuma
dfa4dd6c93
Add support for cohere2 ( #1157 )
...
* add support for cohere2
* revert to act_fn to silu
* fix tests and sliding window attention
* add tests
* add to tuner
* fix sliding window
* add coauthor :)
Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>
* Add rotating kvcache to save space
* some nits
* style
* nits
---------
Co-authored-by: n8programs <43304488+N8python@users.noreply.github.com>
Co-authored-by: N8 <n8@n8programs.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-16 08:01:03 -08:00
Ikko Eltociear Ashimine
fc0674d2d8
chore: update evaluate.py ( #1159 )
...
occurence -> occurrence
2024-12-15 06:06:29 -08:00
Awni Hannun
9f2ea5892e
Bpe stream without space ( #1154 )
...
* bpe streaming detokenization without space
* version bump
2024-12-12 13:13:50 -08:00
Awni Hannun
2ba0e36683
[mlx-lm] Use top p in server ( #1144 )
...
* use top p in server
* couple other fixes
2024-12-12 11:12:21 -08:00
Angelos Katharopoulos
19abf3dcaa
Replace unicode errors instead of raising exception ( #1146 )
2024-12-12 11:10:41 -08:00
madroid
06af3c9b0e
Add finish_reason in GenerationResponse ( #1153 )
2024-12-12 10:37:40 -08:00
Awni Hannun
77b42b7c8b
fix llava ( #1149 )
2024-12-12 10:37:26 -08:00
Alex Barron
135c5818c1
Fix max_tokens ( #1148 )
2024-12-10 11:26:04 -08:00
madroid
12083c4b7e
Support for multiple EOS tokens ( #1141 )
...
* Support for multiple EOS tokens
* Change _eos_token_ids type from list to set
* Remove model_config & add eos_token_id
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 08:53:58 -08:00
n8programs
5687d5b99b
Adds EXAONE architecture. ( #1145 )
...
* Adds EXAONE architecture.
* nits + format
* format
* clean up and fix rope
* clean up and fix rope
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-09 07:58:25 -08:00
hehua2008
893b3f085e
Change Flux default max_shift to 1.15 to match the official one ( #1137 )
2024-12-08 23:29:48 -08:00
Peter Sibley
ed91bbc4dc
Fix final message at end of flux training ( #1143 )
2024-12-08 23:01:53 -08:00
hehua2008
1fd6aae871
Fix flux training with batch size ( #1135 )
...
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-12-08 22:09:04 -08:00
Alex Barron
2211b27388
Mixed Quantizations ( #1132 )
...
* saving/loading mixed quantizations
* comment
* add bits per weight
* more concise bpw
* count bias too
2024-12-08 14:21:50 -08:00
Alex Barron
cd8cf28c39
mlx_lm.evaluate
(#1140 )
...
* Add evaluation script
* only write top level results
* add lm eval version
* typo
* create output dir
* relative import
* comment
---------
Co-authored-by: David Grangier <dgrangier@users.noreply.github.com>
2024-12-08 12:20:10 -08:00
vb
1727959a27
Add mentions of MLX-my-repo. ( #1129 )
...
* Add mentions of MLX-my-repo.
* simplify
* move
* move
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-12-03 19:21:39 -08:00
Awni Hannun
1963df8565
Allow prompt callback to generate_step
( #1133 )
...
* allow prompt callback and use in cache_prompt
* nit
* comments
* bump version
2024-12-03 16:17:14 -08:00
sakares saengkaew
0ca162cfb2
Fix data_iter in prepare_dataset from speechcommands example ( #1113 )
2024-12-02 23:56:07 -08:00
Angelos Katharopoulos
eb9277f574
Allow loading from diffusers ckpt ( #1117 )
2024-12-02 13:15:50 -08:00
hehua2008
2a9294a5f0
Fix bug in FluxSampler.timesteps method ( #1131 )
2024-12-02 13:15:19 -08:00
Awni Hannun
8801beb66f
Add olmo2 ( #1128 )
...
* add olmo2
* add olmo2
2024-12-02 11:42:58 -08:00
Neil Mehta
cefe793ae0
Accept mx.array type for prompt argument for stream_generate ( #1125 )
...
* Accept mx.array type for prompt argument for stream_generate
* Fix formatting
2024-11-26 16:51:55 -08:00
Awni Hannun
cfc29c29f4
Put prompt processing in same stream ( #1122 )
...
* put prompt processing in same stream
* patch
2024-11-25 09:47:00 -08:00
madroid
a5e173802e
docs: update stream_generate return type annotation ( #1121 )
...
Improve documentation clarity by:
1. Fix return type annotation to correctly reflect GenerationResponse
2. Simplify docstring by referencing GenerationResponse class
3. Remove redundant field descriptions
2024-11-25 08:10:14 -08:00
Remixer Dec
adaab81029
Allow converting models from local directories ( #1118 )
2024-11-24 16:41:06 -08:00
Kevin Conner
0ffdb6dd20
Fix object property value in mlx_lm.server chat completions response to match OpenAI spec ( #1119 )
...
These were "chat.completions" and "chat.completions.chunk"
but should be "chat.completion" and "chat.completion.chunk"
for compatibility with clients expecting an OpenAI API.
In particular, this solves a problem in which aider 0.64.1 reports
hitting a token limit on any completion request, no matter how small,
despite apparently correct counts in the usage property.
Refer to:
https://platform.openai.com/docs/api-reference/chat/object
> object string
> The object type, which is always chat.completion.
https://platform.openai.com/docs/api-reference/chat/streaming
> object string
> The object type, which is always chat.completion.chunk.
2024-11-24 16:37:37 -08:00
Awni Hannun
0f135396ae
Generation refactor: part 2 ( #1099 )
...
* unify with stream_generate
* fixes
* nit
* some cleanup, warnings, tests
* fix test + faster min p + test
* version
2024-11-23 11:47:06 -08:00
Awni Hannun
004eb4cc9d
Tencent HunYuan MOE model ( #1100 )
...
* hunyuan
* fix
* format str
* default trust remote code for tokenizer, allow system prompt to be configurable
2024-11-23 11:06:26 -08:00
Angelos Katharopoulos
042280ce50
Fix format ( #1115 )
2024-11-20 16:15:53 -08:00
Valentin Roussellet
60c7b80350
Pass seed to sd img2img ( #1114 )
2024-11-20 15:21:52 -08:00
Alban Lecocq
bd6d910ca3
[MLX LM] Fix f-string formatting in memory warning message ( #1105 )
...
* Fix missing f-prefix for string interpolation in model size warning
* Ensures proper display of memory values in MB for model and max size
2024-11-13 06:14:03 -08:00
madroid
1e07660184
FLUX: save train config ( #1049 )
2024-11-08 17:15:19 -08:00
Awni Hannun
657b4cc0aa
[MLX LM] Sampler refactor + a few improvements ( #1094 )
...
* starting
* refactor sampler/processor and a few improvements
* fix stream
* fix stream generate
* fix eos handling in stream generate
2024-11-07 16:15:24 -08:00