mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-24 17:31:18 +08:00

Author	SHA1	Message	Date
Awni Hannun	657b4cc0aa	[MLX LM] Sampler refactor + a few improvements (#1094 ) * starting * refactor sampler/processor and a few improvements * fix stream * fix stream generate * fix eos handling in stream generate	2024-11-07 16:15:24 -08:00
Awni Hannun	e510987870	Clear cache every now and then (#1081 ) * clear cache every now and then * don't need user arg anymore	2024-11-01 14:15:32 -07:00
Alex Barron	85ffd2c96a	Quantized KV Cache (#1075 ) * add QuantizedKVCache * simplify * add tests * single sdpa function * fix sed * in place * fix tests * support different k and v head dims	2024-10-31 16:59:52 -07:00
Awni Hannun	fca087be49	More cache improvements (#1015 ) * fix rotating kv cache for chat use case * reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat * nit in chat * fix tests * fix tests * fix tests * docs * chat command * comments + docs * Define meta_state on all Cache implementations * fixes + trim_prompt_cache api * fix default model --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-10-07 20:45:51 -07:00
Chime Ogbuji	83a209e200	Add prompt piping (#962 ) * Initial commit of --prompt-only and prompt from STDIN feature * Switch to using --verbose instead of --prompt-only * Fix capitalization typo * Fix reference to changed option name * Update exception text	2024-09-03 13:29:10 -07:00
Awni Hannun	3c6e8b11af	fix (#965 )	2024-08-30 05:56:27 -07:00
Awni Hannun	b1186e2a81	Docs on prompt scaling (#963 ) * docs on prompt scaling * remove unused var * nits	2024-08-29 15:05:17 -07:00
Angelos Katharopoulos	1003a8b2dd	Add the ability to load the KV cache from a file (#956 )	2024-08-28 22:11:45 -07:00
Awni Hannun	7be292c0c9	Handle longer prompt/generation (#931 ) * rebase * nits * nit * fix rotating cache with step prefill * update version	2024-08-16 15:28:39 -07:00
Michał Kurc	43d6deb3c1	mlx_lm: Add Streaming Capability to Generate Function (#807 ) * Add streaming feature to text generation function * separate stream and regular functions --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-03 09:04:39 -07:00
alexC-nonsense4k	42458914c8	support dora finetune in mlx-examples/llms/mlx_lm (#779 ) * support dora finetune * solve problems in lora.py and tuner.utils.py * add use_dora (bool) in functions of load adapters * delete all unsupported quantization code and fix all the calculate problems in mlx_lm/tuner/dora.py * Using stop_gradient to prevent gradients from flowing through ‘norm’ during backpropagation * set DEFAULT_USE_DORA in mlx_lm/generate.py * add annotation for all the use_dora * mlx_lm/fuse.py support fuse dora layers and fix a bug of to_linear() in mlx_lm/tuner/dora.py * simplify code of juding type of a fused layer in mlx_lm/fuse.py * add use_dora in mlx_lm/fuse.py when apply_lora_layers() * style + nits * style + nits * more updates --------- Co-authored-by: chenyifei08 <chenyifei08@baidu.com> Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-16 08:21:26 -07:00
Konstantin Kerekovski	d1c35fa684	Add MLX Cache Limit setting for mlx_lm.generate and mlx_lm.server CLI (#744 ) * Add support for setting MLX cache limit in GB * Add support for setting MLX cache limit in GB in mlx_lm.server * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-03 12:42:48 -07:00
Phúc H. Lê Khắc	35206806ac	Create executables for generate, lora, server, merge, convert (#682 ) * feat: create executables mlx_lm.<cmd> * nits in docs --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-16 16:08:49 -07:00
Awni Hannun	2bd64b78cf	Save lora config (#636 ) * lora config * comments * version bump	2024-04-02 13:52:53 -07:00
Anchen	8f906c859a	chore(mlx-lm): enable to apply default chat template (#577 ) * chore(mlx-lm): enable to apply default chat template * Add option to use default chat template * chore: rename the flag to use default chat template	2024-03-20 21:39:39 -07:00
Anchen	13794a05da	chore(mlx-lm): add adapter support in generate.py (#494 ) * chore(mlx-lm): add adapter support in generate.py * chore: remove generate from lora.py and raise error to let user use mlx_lm.generate instead	2024-02-28 07:49:25 -08:00
Y4hL	676e574eff	Add missing import (#497 ) * Add missing import * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-27 13:27:08 -08:00
Awni Hannun	95f82e67a2	Fix import warning (#479 ) * fix import warning * fix version import * remove api, move convert to utils * also update circle to run external PRs	2024-02-27 08:47:56 -08:00
peterjc123	ccb278bcbd	Add top-p sampling for text generation (#486 )	2024-02-26 06:18:11 -08:00
iLoveBug	40b61c1719	fix the chinese character generation as same as PR #321 (#342 ) * fix the chinese character generation as same as PR #321 * reuse the generate logic to utils.py * format * verbose defualt * fix conflicst with colorize and character check --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-23 12:44:23 -08:00
Ivan Fioravanti	c45c2311bd	Add colorized output option to generate script (#347 ) * Add colorized output option to generate script Two new functions were added to the script that allow output to be colorized based on the T[0] probability. Changes were made to the `generate_step` function in utils.py to permit colorization. Additionally, an argument for colorization was introduced to the command-line parser. * Rename 'colorize' parameter with 'return_probability' in generate_step	2024-01-23 05:25:44 -08:00
Baptiste Canton	42672f5446	add an option to apply the tokenizer chat template (#338 ) * add an option to apply the tokenizer chat template * fix the option to apply the tokenizer chat template * better error messages for chat template issues * apply the chat template by default when possible * nit in comment' * rebase --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-22 19:52:42 -08:00
Anchen	30be4c4734	refactor(qwen): moving qwen into mlx-lm (#312 ) * refactor(qwen): moving qwen into mlx-lm * chore: update doc * chore: fix type hint * add qwen model support in convert * chore: fix doc * chore: only load model in quantize_model * chore: make the convert script only copy tokenizer files instead of load it and save * chore: update docstring * chore: remove unnecessary try catch * chore: clean up for tokenizer and update transformers 4.37 * nits in README --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-22 15:00:07 -08:00
Awni Hannun	c6440416a2	Mlx llm package (#301 ) * fix converter * add recursive files * remove gitignore * remove gitignore * add packages properly * read me update * remove dup readme * relative * fix convert * fix community name * fix url * version	2024-01-12 10:25:56 -08:00

24 Commits