mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-24 01:17:28 +08:00

Author	SHA1	Message	Date
Awni Hannun	2bf11c4633	Use stable url for MNIST (#749 ) * use stable url * remove deprecated flag	2024-05-03 17:13:05 -07:00
madroid	5079af62db	Update model card describe (#654 ) * Update model card describe - Add full link jump - Add the address of the model uploader's Hugging Face homepage * Add user_info to reduce whoami calls * Remove the -U argument * remove HF user info * run pre-commit	2024-05-02 21:22:04 -07:00
Karim Elmaaroufi	4bf2eb17f2	Validate server params & fix logit bias bug (#731 ) * Bug fix in logit bias * Add parameter validations * Fix typo * Update docstrings to match MLX styling * Black style + fix a validation bug	2024-04-30 07:27:40 -07:00
Javier de la Rosa	510d2bde49	Force multi_commits when uploading to HF (#729 )	2024-04-28 19:07:17 -07:00
Awni Hannun	685012c2ad	Couple fixes for LoRA (#711 ) * don't overwrite in test only mode * only load model specific safetensors	2024-04-25 14:16:13 -07:00
Karim Elmaaroufi	1484598de1	Add support for logit bias (#697 )	2024-04-21 06:53:56 -07:00
Awni Hannun	2146bcd7ee	Quantize embedding / Update quantize API (#680 ) * more async eval * quantize embedding / update quantize api * more updates for quantize * update for quantize embeddings * update sd quant API * update sdxl quants * error for datasets < batch_size * async * fix config loading * fix quant * fix tests * fix req * remove lm head if tie weights is true * fix test	2024-04-18 18:16:10 -07:00
Anchen	f5f189e48a	fix(mlx-lm): broken server.py (#690 ) * fix server.py * fix var referenced before assignment * add test * clean up	2024-04-18 14:26:18 -07:00
Awni Hannun	9c5554d8ee	Use async eval (#670 ) * Use async eval * bump * bump * remove workaround for bfloat cumsum	2024-04-11 13:18:23 -07:00
da-z	5a4cad34ef	Always resume downloads (#674 ) * Always resume downloads * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-11 06:52:32 -07:00
Angelos Katharopoulos	1278994b56	Add streaming detokenizers (#651 )	2024-04-08 22:36:01 -07:00
Awni Hannun	1e2f7f50b6	fix for empty initial string (#665 )	2024-04-08 10:40:05 -07:00
Awni Hannun	2bd64b78cf	Save lora config (#636 ) * lora config * comments * version bump	2024-04-02 13:52:53 -07:00
Awni Hannun	78c431dc25	cleanup whisper a little (#639 )	2024-03-30 13:13:58 -07:00
Awni Hannun	5a52899405	Partially stream de-tokenization (#609 ) * partially stream de-tokenization * don't break full response	2024-03-23 15:32:33 -07:00
Anchen	fbed720d6f	chore(mlx-lm): fix the top_p implementation. (#602 ) * chore(mlx-lm): clean up the top p imp * chore: clean up * chore: add test * chore: address comments * chore: clean up docs string * chore: clean up test	2024-03-21 12:18:23 -07:00
Alwin Arrasyid	6c3d4c8ba2	add dequantize option to mlx_lm/convert.py (#547 )	2024-03-19 19:50:08 -07:00
Chime Ogbuji	6f2fd5daea	Add mlx-lm version information to HF model card (#596 ) * Add mlx-lm version informatiohn to HF model card * Update llms/mlx_lm/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Reverted indentation * Pre-commit formatting --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-19 19:42:03 -07:00
Awni Hannun	e4b19bb9e1	Make attention faster for a some models (#574 ) * make attention faster for a couple models * remove unused generation flags * add comment on lora * include text files as well	2024-03-14 21:35:54 -07:00
Sugato Ray	2cd793dd69	feat: add update_config functionality (#531 ) * feat: add `update_config` finctionality - sorts the config for better readability - updates "_name_or_path" key in config with upload_repo - sets indentation of 4 spaces - allows adding other key-value pairs via kwargs - reduces code duplication - standardizes config-update across mlx-lm * feat: standardize updating config Impactes: - fuse.py - merge.py * update formatting * remove commented out code * update func: update_config to save_config - drop kwards - rename func as save_config - incorporate review suggestions * update func: save_config - ensure only config-saving functionality - function oes not return config as a dict anymore - added review suggestions * fixed formatting * update formatting instruction in contribution guide * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-14 06:36:05 -07:00
Anchen	3535408c99	chore(mlx-lm): fix tie_word_embeddings for qwen2 (#566 ) * chore: fix tie_word_embeddings for qwen2 * chore: default tie_word_embeddings to True	2024-03-12 21:34:32 -07:00
Y4hL	b8e5eda4fd	Refactoring of mlx_lm example (#501 ) * Use named tuple from typing for typehints * Add type hints * Simplify expression * Type hint fix * Improved do_POST logic Use a map of endpoints to methods to reduce redundancy in code * Fix format * Improve redundancy Call method dynamically instead of writing out all arguments twice * Send response instead of returning * Fix typo * Revert change * Make adapter_file as Optional * Mark formatter as optional * format * Create message generator Store response data that stays static for the duration of the response inside of the object: system_fingerprint request_id object_type requested_model Created a message generator, that dynamically creates messages from the metadata stored inside of the object, and the data from the model pipeline * Remove leftover * Update parameters to reflect new object structure No longer pass all arguments between functions, but use the stores values inside of the object * Parse body before calling request specific methods * Call super init * Update server.py * Fixed outdated documentation parameter name * Add documentation * Fix sending headers twice During testing I found that when using the streaming option, headers have always been sent twice. This should fix that * Simplify streaming code by using guard clauses Don't wrap wfile writes in try blocks, the server class has its own try block to prevent crashing * Bug fix * Use Content-Length header Let the completion type specific methods finish sending the headers. This allows us to send the Content-Length header as the model returns a completion. * Update utils.py * Add top_p documentation * Type hint model and tokenizer as required * Use static system fingerprint System fingerprint now stays the same across requests * Make type hint more specific * Bug Fix Supplying less than 2 models to merge would raise ValueError and calls len on unbound "models". Should be "model_paths" instead. Mark upload_repo as optional * Move more of the shared code into do_POST Processing stop_id_sequences is done no matter the request endpoint or type, move it into the shared section. handle_ methods now just return the prompt in mx.array form. * Store stop_id_sequences as lists instead of np During testing I found that letting the tokenizer return values as python lists and converting them to mlx arrays was around 20% faster than having the tokenizer convert them to np, and from np to mlx. This allows makes it so numpy no longer needs to be imported. * Update stop_id_sequences docs * Turn if check to non-inclusive Only continue if buffer is smaller * Documentation fix * Cleared method names Instead of handle_stream and generate_competion, we should name it handle_completion. Instead of handle_completions and handle_chat_completions, we should name it handle_text_completions, since both are completions, calling it text completions should make it more descriptive * Make comment clearer * fix format * format	2024-03-06 06:24:31 -08:00
Madroid Ma	710c552731	add huggingface repo url print (#534 )	2024-03-05 21:51:31 -08:00
Miller Liang	5b1043a458	llms: convert() add 'revision' argument (#506 ) * llms: convert() add 'revision' argument * Update README.md * Update utils.py * Update README.md * Update llms/mlx_lm/utils.py --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-03-02 06:28:26 -08:00
Sugato Ray	3acc1ec84e	fix: string indentation with `textwrap.dedent` (#510 ) * fix: string indentation with textwrap.dedent * update formatting	2024-02-29 22:23:01 -08:00
Awni Hannun	ae48563378	Remove gc (#509 ) * remove gc * remove debug	2024-02-29 09:40:04 -08:00
Alex Ishida	ab0f1dd1b6	Add metadata when saving safetensors (#496 ) * Add metadata when saving safetensors Add metadata format="pt" for safetensors so that model's are accessible to `transformers` users as well. * save with metadata format mlx Save the model weights with metadata format of "mlx". * Updated llms/mlx_lm/generate.py	2024-02-28 07:29:00 -08:00
Y4hL	ea92f623d6	Prevent llms/mlx_lm from serving the local directory as a webserver (#498 ) * Don't serve local directory BaseHTTPRequestHandler serves the current directory by default. Definitely not intended behaviour. Remove the "do_HEAD" and "do_GET" methods. * Fix typo in method name I assume hanlde_stream was intended to be called handle_stream * Fix outdated typehint load_model returns nn.Module, however fetch_from_hub was not updated to reflect the change * Add some more type hints * Add warnings for using in prod Add a warning to README and runtime, discouraging use in production. The warning is the same as on the python docs for HTTPServer https://docs.python.org/3/library/http.server.html * format * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-27 19:40:42 -08:00
Awni Hannun	95f82e67a2	Fix import warning (#479 ) * fix import warning * fix version import * remove api, move convert to utils * also update circle to run external PRs	2024-02-27 08:47:56 -08:00
peterjc123	ccb278bcbd	Add top-p sampling for text generation (#486 )	2024-02-26 06:18:11 -08:00
Angelos Katharopoulos	dc4f2e0a6b	Lazy loading models for faster convert and merge (#462 )	2024-02-20 13:36:55 -08:00
vishal-14069	21e19b5b5a	Add Repetitive penalty to LLM inference - mlx-lm (#399 ) * feat: add repetition penalty * fix: generate function argument fix * typo fixes * update repetitive penalty * update generate_step and generate * resolve conflicts in generate * merge latest oull origin master * update generate * update generate and generate_step * update repetition list - rename variable * refactor token count * update generate step and generate * move repetition_context in generate_step * update generate step * update generate_step	2024-02-16 21:58:17 -08:00
Awni Hannun	d4666615bb	Lazy import + refactor Lora layer addition (#426 ) * lazy model import in mlx_lm * change lora loading * fix olmo lora * remove a bunch of unused stuff from plamo * move phixtral to mlx-lm and out of llms/	2024-02-12 10:51:02 -08:00
Anchen	8b77677c05	chore(mlx-lm): add model weight index in save_weights (#413 ) * chore(mlx-lm): add model weight index in save_weights * Update llms/mlx_lm/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * Update llms/mlx_lm/utils.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * chore: save total siZe as param size isntead of file size * chore: clean up format --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2024-02-06 05:32:15 -08:00
Awni Hannun	aa7447efa2	Olmo in MLX LM (#415 ) * run olmo * format	2024-02-05 21:13:49 -08:00
Junyang Lin	9d0dd34403	add qwen2 (#411 )	2024-02-04 08:31:38 -08:00
Madroid Ma	ba3a9355d1	LoRA: Remove unnecessary model type judgments (#388 ) * LoRA: Remove unnecessary model type judgments 1. Supported models are already checked in the load_model function in utils, no need to repeat the check in lora 2. The checks in lora are not synchronized with those in utils * LoRA: add LoRA supported models in mlx_lm utils	2024-01-31 11:55:27 -08:00
David Koski	f8fadf7a17	Fix token count computation to fix tps measurements (#392 )	2024-01-30 11:24:16 -08:00
Ashish	20b969b412	Replace time.time() with time.perf_counter() as it is more suited for benchmarking (#380 )	2024-01-26 14:11:38 -08:00
Awni Hannun	5aa652d3c2	remove simplify (#379 )	2024-01-26 13:54:49 -08:00
Ashish	0b57f0eae6	Add StableLM-2 1.6B (#378 ) * init * stablelm * add to readme * bump version --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-26 10:28:00 -08:00
Anchen	854ad8747a	feat(mlx-lm): add de-quant for fuse.py (#365 ) * feat(mlx-lm): add de-quant for fuse * chore: disable quant in to linear when de-quant enabled * chore: add better error handling for adapter file not found	2024-01-25 18:59:32 -08:00
Anchen	b1dec281b3	feat(mlx-lm): add lora hypeparameters in lora layer (#366 ) * feat(mlx-lm): add lora hypeparameters in lora layer * chore: address comments	2024-01-24 08:11:25 -08:00
Anchen	5fc8668a53	fix(mlx-lm): handle legacy quant models (#369 )	2024-01-24 07:44:05 -08:00
Anchen	ab91ac1075	chore(mlx-lm): add load model with adapter and fix bug in sample (#360 ) * chore: add load model with adapter support and fix bug in sample * chore: ignore temp during calculating prob in sample	2024-01-23 19:47:39 -08:00
iLoveBug	40b61c1719	fix the chinese character generation as same as PR #321 (#342 ) * fix the chinese character generation as same as PR #321 * reuse the generate logic to utils.py * format * verbose defualt * fix conflicst with colorize and character check --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-23 12:44:23 -08:00
Anchen	362e88a744	feat: move lora into mlx-lm (#337 ) * feat: Add lora and qlora training to mlx-lm --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-23 08:44:37 -08:00
Shunta Saito	85c1ff8fd6	Add PLaMo-13B model as an LLM example (#303 ) * Convert HF weights of PLaMo and load it to a plamo model in mlx * Fix model inference part * Add bos at the beginning of the prompt * Fix convert.py to copy tokenizer.model into the converted dir * Use the required insturction format in generate.py when "--instruct" option is specified * Change filenames and update existing scripts * Add README * Add requirements.txt * Fix plamo.py to stop generation when EOS appears * Add quantization to convert.py * Use mlx>=0.0.9 for mx.core.outer() in PLaMo model * Update acknowledgements.md * Fix card text in upload_to_hub() * Not use prompt template when --instruct is not specified * Ask if you trust_remote_code for loading tokenizer of PLaMo * Check the user trusts the remote code when converting * Remove plamo directory * Update README * Add PLaMo model file * Fix the handling of cache in PLaMo and update README * Ask if trust_remote_code only when the model is PLaMo * Remove resolve_trust_remote_code from convert.py and use the latest transformers * Remove code not to add EOS * Update README to fix an example not to use noncommercial version of the model * Remove unused imports * Remove unnecessary description about the instruct model of PLaMo from README * format, nits in README * typo --------- Co-authored-by: Shunta Saito <shunta@mitmul-mbp.local> Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-23 07:17:24 -08:00
Ivan Fioravanti	c45c2311bd	Add colorized output option to generate script (#347 ) * Add colorized output option to generate script Two new functions were added to the script that allow output to be colorized based on the T[0] probability. Changes were made to the `generate_step` function in utils.py to permit colorization. Additionally, an argument for colorization was introduced to the command-line parser. * Rename 'colorize' parameter with 'return_probability' in generate_step	2024-01-23 05:25:44 -08:00
Anchen	30be4c4734	refactor(qwen): moving qwen into mlx-lm (#312 ) * refactor(qwen): moving qwen into mlx-lm * chore: update doc * chore: fix type hint * add qwen model support in convert * chore: fix doc * chore: only load model in quantize_model * chore: make the convert script only copy tokenizer files instead of load it and save * chore: update docstring * chore: remove unnecessary try catch * chore: clean up for tokenizer and update transformers 4.37 * nits in README --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-22 15:00:07 -08:00

1 2

56 Commits