mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-08-13 04:36:53 +08:00

History

Y4hL b8e5eda4fd Refactoring of mlx_lm example (#501 ) * Use named tuple from typing for typehints * Add type hints * Simplify expression * Type hint fix * Improved do_POST logic Use a map of endpoints to methods to reduce redundancy in code * Fix format * Improve redundancy Call method dynamically instead of writing out all arguments twice * Send response instead of returning * Fix typo * Revert change * Make adapter_file as Optional * Mark formatter as optional * format * Create message generator Store response data that stays static for the duration of the response inside of the object: system_fingerprint request_id object_type requested_model Created a message generator, that dynamically creates messages from the metadata stored inside of the object, and the data from the model pipeline * Remove leftover * Update parameters to reflect new object structure No longer pass all arguments between functions, but use the stores values inside of the object * Parse body before calling request specific methods * Call super init * Update server.py * Fixed outdated documentation parameter name * Add documentation * Fix sending headers twice During testing I found that when using the streaming option, headers have always been sent twice. This should fix that * Simplify streaming code by using guard clauses Don't wrap wfile writes in try blocks, the server class has its own try block to prevent crashing * Bug fix * Use Content-Length header Let the completion type specific methods finish sending the headers. This allows us to send the Content-Length header as the model returns a completion. * Update utils.py * Add top_p documentation * Type hint model and tokenizer as required * Use static system fingerprint System fingerprint now stays the same across requests * Make type hint more specific * Bug Fix Supplying less than 2 models to merge would raise ValueError and calls len on unbound "models". Should be "model_paths" instead. Mark upload_repo as optional * Move more of the shared code into do_POST Processing stop_id_sequences is done no matter the request endpoint or type, move it into the shared section. handle_ methods now just return the prompt in mx.array form. * Store stop_id_sequences as lists instead of np During testing I found that letting the tokenizer return values as python lists and converting them to mlx arrays was around 20% faster than having the tokenizer convert them to np, and from np to mlx. This allows makes it so numpy no longer needs to be imported. * Update stop_id_sequences docs * Turn if check to non-inclusive Only continue if buffer is smaller * Documentation fix * Cleared method names Instead of handle_stream and generate_competion, we should name it handle_completion. Instead of handle_completions and handle_chat_completions, we should name it handle_text_completions, since both are completions, calling it text completions should make it more descriptive * Make comment clearer * fix format * format		2024-03-06 06:24:31 -08:00
..
examples	Support for slerp merging models (#455 )	2024-02-19 20:37:15 -08:00
models	Add tips on porting LLMs from HuggingFace (#523 )	2024-03-05 17:43:15 -08:00
tuner	Add Starcoder 2 (#502 )	2024-03-02 19:39:23 -08:00
__init__.py	Fix import warning (#479 )	2024-02-27 08:47:56 -08:00
convert.py	Fix import warning (#479 )	2024-02-27 08:47:56 -08:00
fuse.py	feat(mlx-lm): add de-quant for fuse.py (#365 )	2024-01-25 18:59:32 -08:00
generate.py	chore(mlx-lm): add adapter support in generate.py (#494 )	2024-02-28 07:49:25 -08:00
LORA.md	chore(mlx-lm): add adapter support in generate.py (#494 )	2024-02-28 07:49:25 -08:00
lora.py	chore(mlx-lm): add adapter support in generate.py (#494 )	2024-02-28 07:49:25 -08:00
MERGE.md	Support for slerp merging models (#455 )	2024-02-19 20:37:15 -08:00
merge.py	Refactoring of mlx_lm example (#501 )	2024-03-06 06:24:31 -08:00
py.typed	Add `py.typed` to support PEP-561 (type-hinting) (#389 )	2024-01-30 21:17:38 -08:00
README.md	feat: move lora into mlx-lm (#337 )	2024-01-23 08:44:37 -08:00
requirements.txt	[mlx-lm] Add precompiled normalizations (#451 )	2024-02-22 12:40:55 -08:00
SERVER.md	Prevent llms/mlx_lm from serving the local directory as a webserver (#498 )	2024-02-27 19:40:42 -08:00
server.py	Refactoring of mlx_lm example (#501 )	2024-03-06 06:24:31 -08:00
UPLOAD.md	Mlx llm package (#301 )	2024-01-12 10:25:56 -08:00
utils.py	Refactoring of mlx_lm example (#501 )	2024-03-06 06:24:31 -08:00
version.py	Fix import warning (#479 )	2024-02-27 08:47:56 -08:00

README.md

Generate Text with MLX and 🤗 Hugging Face

This an example of large language model text generation that can pull models from the Hugging Face Hub.

For more information on this example, see the README in the parent directory.

This package also supports fine tuning with LoRA or QLoRA. For more information see the LoRA documentation.