mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-24 17:31:18 +08:00

Author	SHA1	Message	Date
madroid	12083c4b7e	Support for multiple EOS tokens (#1141 ) * Support for multiple EOS tokens * Change _eos_token_ids type from list to set * Remove model_config & add eos_token_id * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-12-09 08:53:58 -08:00
Alex Barron	2211b27388	Mixed Quantizations (#1132 ) * saving/loading mixed quantizations * comment * add bits per weight * more concise bpw * count bias too	2024-12-08 14:21:50 -08:00
Awni Hannun	1963df8565	Allow prompt callback to `generate_step` (#1133 ) * allow prompt callback and use in cache_prompt * nit * comments * bump version	2024-12-03 16:17:14 -08:00
Neil Mehta	cefe793ae0	Accept mx.array type for prompt argument for stream_generate (#1125 ) * Accept mx.array type for prompt argument for stream_generate * Fix formatting	2024-11-26 16:51:55 -08:00
Awni Hannun	cfc29c29f4	Put prompt processing in same stream (#1122 ) * put prompt processing in same stream * patch	2024-11-25 09:47:00 -08:00
madroid	a5e173802e	docs: update stream_generate return type annotation (#1121 ) Improve documentation clarity by: 1. Fix return type annotation to correctly reflect GenerationResponse 2. Simplify docstring by referencing GenerationResponse class 3. Remove redundant field descriptions	2024-11-25 08:10:14 -08:00
Awni Hannun	0f135396ae	Generation refactor: part 2 (#1099 ) * unify with stream_generate * fixes * nit * some cleanup, warnings, tests * fix test + faster min p + test * version	2024-11-23 11:47:06 -08:00
Alban Lecocq	bd6d910ca3	[MLX LM] Fix f-string formatting in memory warning message (#1105 ) * Fix missing f-prefix for string interpolation in model size warning * Ensures proper display of memory values in MB for model and max size	2024-11-13 06:14:03 -08:00
Awni Hannun	657b4cc0aa	[MLX LM] Sampler refactor + a few improvements (#1094 ) * starting * refactor sampler/processor and a few improvements * fix stream * fix stream generate * fix eos handling in stream generate	2024-11-07 16:15:24 -08:00
ilyasch2	3b526f0aa1	Add support for falcon-mamba (#1074 ) * Add support for falcon-mamba * nits * nit --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-11-04 12:23:30 -08:00
Awni Hannun	e510987870	Clear cache every now and then (#1081 ) * clear cache every now and then * don't need user arg anymore	2024-11-01 14:15:32 -07:00
Alex Barron	85ffd2c96a	Quantized KV Cache (#1075 ) * add QuantizedKVCache * simplify * add tests * single sdpa function * fix sed * in place * fix tests * support different k and v head dims	2024-10-31 16:59:52 -07:00
Awni Hannun	9f34fdbda4	Wire models in MLX LM (#1069 ) * wired in MLX LM * fix synch * comment + nit * version * mlx lm version * bump to 0.19.2	2024-10-31 08:17:14 -07:00
Awni Hannun	66e7bcb886	override dtype with quant (#1062 )	2024-10-22 09:56:45 -07:00
Awni Hannun	c799133998	Make llm async eval less brittle (#1040 ) * Make llm async eval less brittle * nit	2024-10-14 10:25:24 -07:00
Awni Hannun	4360e7ccec	clear cache during prompt processing (#1027 )	2024-10-09 16:48:32 -07:00
Awni Hannun	b7373cb44f	fix long prompt generations (#1023 )	2024-10-09 11:09:36 -07:00
Awni Hannun	fca087be49	More cache improvements (#1015 ) * fix rotating kv cache for chat use case * reorg + fixes to caching, unify prompt caching across types and use cases for e.g. caching during a chat * nit in chat * fix tests * fix tests * fix tests * docs * chat command * comments + docs * Define meta_state on all Cache implementations * fixes + trim_prompt_cache api * fix default model --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-10-07 20:45:51 -07:00
nathan	0866e23a67	repetiton_penalty and logits_bias just using logits_processors (#1004 ) * refactor of repetition_penalty and logits_bias to use logits_processor * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-30 08:49:03 -07:00
Gökdeniz Gülmez	50e5ca81a8	Adding full finetuning (#903 ) * Adding full model weights finetuning * Updating the LORA.md and ACKNOWLEDGMENTS.md files. * removing --use-dora and --fulll-training and adding --fine-tune-type * some clean up * reformating and fixing dora training * updated CONFIG_DEFAULTS * update config example * update in the config example fie * Update LORA.md * merge and commit * adding argument for dora linear layer * clean up * clean up in the example yaml file * fix * final fix before sending * small addition to re md file * fix for loading the fully trained model by saving all the files and configs correctly * clean up * removing the unnesesairy files * changing lora layers back to 16 * removed max file size * nits * resolve merge * some consistency changes --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-29 17:12:47 -07:00
nathan	ace2bb5890	Add logits_processor option to generate_step function (#983 ) * Add logits_processor option for the generation as in huggingface transformers library * concatenation correction * Rename the tokens variable for clarity * remove the logit_bias argument from generate_step method * fix the variable name * nits + test * test * add back logit bias + test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-28 10:08:49 -07:00
Awni Hannun	f530f56df2	don't use internal exception (#990 )	2024-09-17 16:22:48 -07:00
Awni Hannun	6c2369e4b9	Fix bug in upload + docs nit (#981 ) * fix bug in upload + docs nit * nit	2024-09-07 14:46:57 -07:00
Awni Hannun	c3e3411756	Update LLM generation docs to use chat template (#973 ) * fix docs * add template to model cards as well * revert * version	2024-09-07 06:06:15 -07:00
madroid	bd29aec299	Support HuggingFace model tree (#957 ) * Hub: Update quantization configuration fields * Hub: add base_model metadata * Hub: add quantization_config for model tree Quantized type * Hub: update quantization_config value * Hub: remove config print	2024-09-04 06:19:32 -07:00
Angelos Katharopoulos	1003a8b2dd	Add the ability to load the KV cache from a file (#956 )	2024-08-28 22:11:45 -07:00
Awni Hannun	7be292c0c9	Handle longer prompt/generation (#931 ) * rebase * nits * nit * fix rotating cache with step prefill * update version	2024-08-16 15:28:39 -07:00
Chime Ogbuji	c50971e860	Min P implementation (#926 ) * Min P implementation * Change default to 0 (no min_p) * nits * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-15 15:45:02 -07:00
Awni Hannun	9b83004631	Faster sampling with `mx.compile` (#937 ) * faster sampling with compile * fix test	2024-08-15 11:29:09 -07:00
Awni Hannun	95840f32e2	Fix whipser conversion for safetensors models (#935 ) * fix whipser conversion for safetensor only. error in mlx lm for existing paths * fix tests	2024-08-14 10:22:04 -07:00
Anchen	7a3ab1620a	support load model by custom get_model_classes (#899 ) * feature(mlx_lm): support load model by custom get classes * rename the param	2024-07-25 11:01:17 -07:00
Awni Hannun	20e221f7f7	Add recurrent gemma (#856 ) * add recurrent gemma * fix window cache	2024-07-07 12:10:04 -07:00
Chime Ogbuji	1d701a1831	Logprobs info to completion API (#806 ) * Initial implementation * Fix handling of return_step_logits in return * Fixed OpenAI parameter expectations and logprob structure and datatypes * pre-commit black formatting * Remove unused parameter * fix log probs * fix colorize * nits in server * nits in server * Fix top_logprobs structure (a dict) and include tokens in logprobs response * nits * fix types --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-23 10:35:13 -07:00
Michał Kurc	43d6deb3c1	mlx_lm: Add Streaming Capability to Generate Function (#807 ) * Add streaming feature to text generation function * separate stream and regular functions --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-03 09:04:39 -07:00
Awni Hannun	ca7ce60c91	Rename block sparse to gather (#793 ) * rename block sparse to gather * pin mlx version	2024-05-23 19:47:35 -07:00
Angelos Katharopoulos	9f671228cd	Block sparse MM MoEs (#782 ) - Adds SwitchLinear - Adds QuantizedSwitchLinear	2024-05-21 15:58:08 -07:00
AtakanTekparmak	199df9e110	fix: Added dedicated error handling to load and get_model_path (#775 ) * fix: Added dedicated error handling to load and get_model_path Added proper error handling to load and get_model_path by adding a dedicated exception class, because when the local path is not right, it still throws the huggingface RepositoryNotFoundError * fix: Changed error message and resolved lack of import * fix: Removed redundant try-catch block * nits in message * nits in message --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-20 06:39:05 -07:00
JosefAlbers	10853b57d9	Add `model_config` parameter to `load()` and `load_model()` (#770 ) * Add `model_config` parameter to `load()` and `load_model()` For easy editing of the loaded model configuration (e.g., for changing RoPE theta or scaling of Phi-3 model) Example: ```python from mlx_lm import load, generate model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed", model_config={"rope_theta":50000.0}) response = generate(model, tokenizer, prompt, max_tokens=MAX_TOKENS) ``` * Possible bug (default_loss) * Revert "Possible bug (default_loss)" This reverts commit `70a55ace18`. * Fix default_loss for lora * 1. move load_model's new optional `model_config` arg to the end (fetch_from_hub()'s `model = load_model(model_path, lazy)`) 2. fix indentations (`black` hook)	2024-05-10 10:13:34 -07:00
Awni Hannun	ee60e2a9d5	Kv cache (#643 ) * in place kv_cache * fix * fix kv cache size * partially fix kv cache dtype * step kv cache * multiple of step size * more teests + kv cache * more kv cache * udpate all models to use kv cache	2024-05-08 08:18:13 -07:00
Awni Hannun	2bf11c4633	Use stable url for MNIST (#749 ) * use stable url * remove deprecated flag	2024-05-03 17:13:05 -07:00
madroid	5079af62db	Update model card describe (#654 ) * Update model card describe - Add full link jump - Add the address of the model uploader's Hugging Face homepage * Add user_info to reduce whoami calls * Remove the -U argument * remove HF user info * run pre-commit	2024-05-02 21:22:04 -07:00
Karim Elmaaroufi	4bf2eb17f2	Validate server params & fix logit bias bug (#731 ) * Bug fix in logit bias * Add parameter validations * Fix typo * Update docstrings to match MLX styling * Black style + fix a validation bug	2024-04-30 07:27:40 -07:00
Javier de la Rosa	510d2bde49	Force multi_commits when uploading to HF (#729 )	2024-04-28 19:07:17 -07:00
Awni Hannun	685012c2ad	Couple fixes for LoRA (#711 ) * don't overwrite in test only mode * only load model specific safetensors	2024-04-25 14:16:13 -07:00
Karim Elmaaroufi	1484598de1	Add support for logit bias (#697 )	2024-04-21 06:53:56 -07:00
Awni Hannun	2146bcd7ee	Quantize embedding / Update quantize API (#680 ) * more async eval * quantize embedding / update quantize api * more updates for quantize * update for quantize embeddings * update sd quant API * update sdxl quants * error for datasets < batch_size * async * fix config loading * fix quant * fix tests * fix req * remove lm head if tie weights is true * fix test	2024-04-18 18:16:10 -07:00
Anchen	f5f189e48a	fix(mlx-lm): broken server.py (#690 ) * fix server.py * fix var referenced before assignment * add test * clean up	2024-04-18 14:26:18 -07:00
Awni Hannun	9c5554d8ee	Use async eval (#670 ) * Use async eval * bump * bump * remove workaround for bfloat cumsum	2024-04-11 13:18:23 -07:00
da-z	5a4cad34ef	Always resume downloads (#674 ) * Always resume downloads * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-11 06:52:32 -07:00
Angelos Katharopoulos	1278994b56	Add streaming detokenizers (#651 )	2024-04-08 22:36:01 -07:00

1 2

95 Commits