mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-24 09:21:18 +08:00

Author	SHA1	Message	Date
Chime Ogbuji	c50971e860	Min P implementation (#926 ) * Min P implementation * Change default to 0 (no min_p) * nits * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-15 15:45:02 -07:00
Awni Hannun	9b83004631	Faster sampling with `mx.compile` (#937 ) * faster sampling with compile * fix test	2024-08-15 11:29:09 -07:00
Awni Hannun	95840f32e2	Fix whipser conversion for safetensors models (#935 ) * fix whipser conversion for safetensor only. error in mlx lm for existing paths * fix tests	2024-08-14 10:22:04 -07:00
Anchen	7a3ab1620a	support load model by custom get_model_classes (#899 ) * feature(mlx_lm): support load model by custom get classes * rename the param	2024-07-25 11:01:17 -07:00
Awni Hannun	20e221f7f7	Add recurrent gemma (#856 ) * add recurrent gemma * fix window cache	2024-07-07 12:10:04 -07:00
Chime Ogbuji	1d701a1831	Logprobs info to completion API (#806 ) * Initial implementation * Fix handling of return_step_logits in return * Fixed OpenAI parameter expectations and logprob structure and datatypes * pre-commit black formatting * Remove unused parameter * fix log probs * fix colorize * nits in server * nits in server * Fix top_logprobs structure (a dict) and include tokens in logprobs response * nits * fix types --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-23 10:35:13 -07:00
Michał Kurc	43d6deb3c1	mlx_lm: Add Streaming Capability to Generate Function (#807 ) * Add streaming feature to text generation function * separate stream and regular functions --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-03 09:04:39 -07:00
Awni Hannun	ca7ce60c91	Rename block sparse to gather (#793 ) * rename block sparse to gather * pin mlx version	2024-05-23 19:47:35 -07:00
Angelos Katharopoulos	9f671228cd	Block sparse MM MoEs (#782 ) - Adds SwitchLinear - Adds QuantizedSwitchLinear	2024-05-21 15:58:08 -07:00
AtakanTekparmak	199df9e110	fix: Added dedicated error handling to load and get_model_path (#775 ) * fix: Added dedicated error handling to load and get_model_path Added proper error handling to load and get_model_path by adding a dedicated exception class, because when the local path is not right, it still throws the huggingface RepositoryNotFoundError * fix: Changed error message and resolved lack of import * fix: Removed redundant try-catch block * nits in message * nits in message --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-05-20 06:39:05 -07:00
JosefAlbers	10853b57d9	Add `model_config` parameter to `load()` and `load_model()` (#770 ) * Add `model_config` parameter to `load()` and `load_model()` For easy editing of the loaded model configuration (e.g., for changing RoPE theta or scaling of Phi-3 model) Example: ```python from mlx_lm import load, generate model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed", model_config={"rope_theta":50000.0}) response = generate(model, tokenizer, prompt, max_tokens=MAX_TOKENS) ``` * Possible bug (default_loss) * Revert "Possible bug (default_loss)" This reverts commit `70a55ace18`. * Fix default_loss for lora * 1. move load_model's new optional `model_config` arg to the end (fetch_from_hub()'s `model = load_model(model_path, lazy)`) 2. fix indentations (`black` hook)	2024-05-10 10:13:34 -07:00
Awni Hannun	ee60e2a9d5	Kv cache (#643 ) * in place kv_cache * fix * fix kv cache size * partially fix kv cache dtype * step kv cache * multiple of step size * more teests + kv cache * more kv cache * udpate all models to use kv cache	2024-05-08 08:18:13 -07:00
Awni Hannun	2bf11c4633	Use stable url for MNIST (#749 ) * use stable url * remove deprecated flag	2024-05-03 17:13:05 -07:00
madroid	5079af62db	Update model card describe (#654 ) * Update model card describe - Add full link jump - Add the address of the model uploader's Hugging Face homepage * Add user_info to reduce whoami calls * Remove the -U argument * remove HF user info * run pre-commit	2024-05-02 21:22:04 -07:00
Karim Elmaaroufi	4bf2eb17f2	Validate server params & fix logit bias bug (#731 ) * Bug fix in logit bias * Add parameter validations * Fix typo * Update docstrings to match MLX styling * Black style + fix a validation bug	2024-04-30 07:27:40 -07:00
Javier de la Rosa	510d2bde49	Force multi_commits when uploading to HF (#729 )	2024-04-28 19:07:17 -07:00
Awni Hannun	685012c2ad	Couple fixes for LoRA (#711 ) * don't overwrite in test only mode * only load model specific safetensors	2024-04-25 14:16:13 -07:00
Karim Elmaaroufi	1484598de1	Add support for logit bias (#697 )	2024-04-21 06:53:56 -07:00
Awni Hannun	2146bcd7ee	Quantize embedding / Update quantize API (#680 ) * more async eval * quantize embedding / update quantize api * more updates for quantize * update for quantize embeddings * update sd quant API * update sdxl quants * error for datasets < batch_size * async * fix config loading * fix quant * fix tests * fix req * remove lm head if tie weights is true * fix test	2024-04-18 18:16:10 -07:00
Anchen	f5f189e48a	fix(mlx-lm): broken server.py (#690 ) * fix server.py * fix var referenced before assignment * add test * clean up	2024-04-18 14:26:18 -07:00
Awni Hannun	9c5554d8ee	Use async eval (#670 ) * Use async eval * bump * bump * remove workaround for bfloat cumsum	2024-04-11 13:18:23 -07:00
da-z	5a4cad34ef	Always resume downloads (#674 ) * Always resume downloads * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-11 06:52:32 -07:00
Angelos Katharopoulos	1278994b56	Add streaming detokenizers (#651 )	2024-04-08 22:36:01 -07:00
Awni Hannun	1e2f7f50b6	fix for empty initial string (#665 )	2024-04-08 10:40:05 -07:00
Awni Hannun	2bd64b78cf	Save lora config (#636 ) * lora config * comments * version bump	2024-04-02 13:52:53 -07:00
Awni Hannun	78c431dc25	cleanup whisper a little (#639 )	2024-03-30 13:13:58 -07:00
Awni Hannun	5a52899405	Partially stream de-tokenization (#609 ) * partially stream de-tokenization * don't break full response	2024-03-23 15:32:33 -07:00
Anchen	fbed720d6f	chore(mlx-lm): fix the top_p implementation. (#602 ) * chore(mlx-lm): clean up the top p imp * chore: clean up * chore: add test * chore: address comments * chore: clean up docs string * chore: clean up test	2024-03-21 12:18:23 -07:00
Alwin Arrasyid	6c3d4c8ba2	add dequantize option to mlx_lm/convert.py (#547 )	2024-03-19 19:50:08 -07:00
Chime Ogbuji	6f2fd5daea	Add mlx-lm version information to HF model card (#596 ) * Add mlx-lm version informatiohn to HF model card * Update llms/mlx_lm/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Reverted indentation * Pre-commit formatting --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-19 19:42:03 -07:00
Awni Hannun	e4b19bb9e1	Make attention faster for a some models (#574 ) * make attention faster for a couple models * remove unused generation flags * add comment on lora * include text files as well	2024-03-14 21:35:54 -07:00
Sugato Ray	2cd793dd69	feat: add update_config functionality (#531 ) * feat: add `update_config` finctionality - sorts the config for better readability - updates "_name_or_path" key in config with upload_repo - sets indentation of 4 spaces - allows adding other key-value pairs via kwargs - reduces code duplication - standardizes config-update across mlx-lm * feat: standardize updating config Impactes: - fuse.py - merge.py * update formatting * remove commented out code * update func: update_config to save_config - drop kwards - rename func as save_config - incorporate review suggestions * update func: save_config - ensure only config-saving functionality - function oes not return config as a dict anymore - added review suggestions * fixed formatting * update formatting instruction in contribution guide * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-14 06:36:05 -07:00
Anchen	3535408c99	chore(mlx-lm): fix tie_word_embeddings for qwen2 (#566 ) * chore: fix tie_word_embeddings for qwen2 * chore: default tie_word_embeddings to True	2024-03-12 21:34:32 -07:00
Y4hL	b8e5eda4fd	Refactoring of mlx_lm example (#501 ) * Use named tuple from typing for typehints * Add type hints * Simplify expression * Type hint fix * Improved do_POST logic Use a map of endpoints to methods to reduce redundancy in code * Fix format * Improve redundancy Call method dynamically instead of writing out all arguments twice * Send response instead of returning * Fix typo * Revert change * Make adapter_file as Optional * Mark formatter as optional * format * Create message generator Store response data that stays static for the duration of the response inside of the object: system_fingerprint request_id object_type requested_model Created a message generator, that dynamically creates messages from the metadata stored inside of the object, and the data from the model pipeline * Remove leftover * Update parameters to reflect new object structure No longer pass all arguments between functions, but use the stores values inside of the object * Parse body before calling request specific methods * Call super init * Update server.py * Fixed outdated documentation parameter name * Add documentation * Fix sending headers twice During testing I found that when using the streaming option, headers have always been sent twice. This should fix that * Simplify streaming code by using guard clauses Don't wrap wfile writes in try blocks, the server class has its own try block to prevent crashing * Bug fix * Use Content-Length header Let the completion type specific methods finish sending the headers. This allows us to send the Content-Length header as the model returns a completion. * Update utils.py * Add top_p documentation * Type hint model and tokenizer as required * Use static system fingerprint System fingerprint now stays the same across requests * Make type hint more specific * Bug Fix Supplying less than 2 models to merge would raise ValueError and calls len on unbound "models". Should be "model_paths" instead. Mark upload_repo as optional * Move more of the shared code into do_POST Processing stop_id_sequences is done no matter the request endpoint or type, move it into the shared section. handle_ methods now just return the prompt in mx.array form. * Store stop_id_sequences as lists instead of np During testing I found that letting the tokenizer return values as python lists and converting them to mlx arrays was around 20% faster than having the tokenizer convert them to np, and from np to mlx. This allows makes it so numpy no longer needs to be imported. * Update stop_id_sequences docs * Turn if check to non-inclusive Only continue if buffer is smaller * Documentation fix * Cleared method names Instead of handle_stream and generate_competion, we should name it handle_completion. Instead of handle_completions and handle_chat_completions, we should name it handle_text_completions, since both are completions, calling it text completions should make it more descriptive * Make comment clearer * fix format * format	2024-03-06 06:24:31 -08:00
Madroid Ma	710c552731	add huggingface repo url print (#534 )	2024-03-05 21:51:31 -08:00
Miller Liang	5b1043a458	llms: convert() add 'revision' argument (#506 ) * llms: convert() add 'revision' argument * Update README.md * Update utils.py * Update README.md * Update llms/mlx_lm/utils.py --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-02 06:28:26 -08:00
Sugato Ray	3acc1ec84e	fix: string indentation with `textwrap.dedent` (#510 ) * fix: string indentation with textwrap.dedent * update formatting	2024-02-29 22:23:01 -08:00
Awni Hannun	ae48563378	Remove gc (#509 ) * remove gc * remove debug	2024-02-29 09:40:04 -08:00
Alex Ishida	ab0f1dd1b6	Add metadata when saving safetensors (#496 ) * Add metadata when saving safetensors Add metadata format="pt" for safetensors so that model's are accessible to `transformers` users as well. * save with metadata format mlx Save the model weights with metadata format of "mlx". * Updated llms/mlx_lm/generate.py	2024-02-28 07:29:00 -08:00
Y4hL	ea92f623d6	Prevent llms/mlx_lm from serving the local directory as a webserver (#498 ) * Don't serve local directory BaseHTTPRequestHandler serves the current directory by default. Definitely not intended behaviour. Remove the "do_HEAD" and "do_GET" methods. * Fix typo in method name I assume hanlde_stream was intended to be called handle_stream * Fix outdated typehint load_model returns nn.Module, however fetch_from_hub was not updated to reflect the change * Add some more type hints * Add warnings for using in prod Add a warning to README and runtime, discouraging use in production. The warning is the same as on the python docs for HTTPServer https://docs.python.org/3/library/http.server.html * format * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-27 19:40:42 -08:00
Awni Hannun	95f82e67a2	Fix import warning (#479 ) * fix import warning * fix version import * remove api, move convert to utils * also update circle to run external PRs	2024-02-27 08:47:56 -08:00
peterjc123	ccb278bcbd	Add top-p sampling for text generation (#486 )	2024-02-26 06:18:11 -08:00
Angelos Katharopoulos	dc4f2e0a6b	Lazy loading models for faster convert and merge (#462 )	2024-02-20 13:36:55 -08:00
vishal-14069	21e19b5b5a	Add Repetitive penalty to LLM inference - mlx-lm (#399 ) * feat: add repetition penalty * fix: generate function argument fix * typo fixes * update repetitive penalty * update generate_step and generate * resolve conflicts in generate * merge latest oull origin master * update generate * update generate and generate_step * update repetition list - rename variable * refactor token count * update generate step and generate * move repetition_context in generate_step * update generate step * update generate_step	2024-02-16 21:58:17 -08:00
Awni Hannun	d4666615bb	Lazy import + refactor Lora layer addition (#426 ) * lazy model import in mlx_lm * change lora loading * fix olmo lora * remove a bunch of unused stuff from plamo * move phixtral to mlx-lm and out of llms/	2024-02-12 10:51:02 -08:00
Anchen	8b77677c05	chore(mlx-lm): add model weight index in save_weights (#413 ) * chore(mlx-lm): add model weight index in save_weights * Update llms/mlx_lm/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update llms/mlx_lm/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * chore: save total siZe as param size isntead of file size * chore: clean up format --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-02-06 05:32:15 -08:00
Awni Hannun	aa7447efa2	Olmo in MLX LM (#415 ) * run olmo * format	2024-02-05 21:13:49 -08:00
Junyang Lin	9d0dd34403	add qwen2 (#411 )	2024-02-04 08:31:38 -08:00
Madroid Ma	ba3a9355d1	LoRA: Remove unnecessary model type judgments (#388 ) * LoRA: Remove unnecessary model type judgments 1. Supported models are already checked in the load_model function in utils, no need to repeat the check in lora 2. The checks in lora are not synchronized with those in utils * LoRA: add LoRA supported models in mlx_lm utils	2024-01-31 11:55:27 -08:00
David Koski	f8fadf7a17	Fix token count computation to fix tps measurements (#392 )	2024-01-30 11:24:16 -08:00

1 2

68 Commits