mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-08-29 18:26:37 +08:00

Author	SHA1	Message	Date
Kevin Conner	ec494a97ec	Fix object property value in mlx_lm.server chat completions response to match OpenAI spec These were "chat.completions" and "chat.completions.chunk" but should be "chat.completion" and "chat.completion.chunk" for compatibility with clients expecting an OpenAI API. In particular, this solves a problem in which aider 0.64.1 reports hitting a token limit on any completion request, no matter how small, despite apparently correct counts in the usage property. Refer to: https://platform.openai.com/docs/api-reference/chat/object > object string > The object type, which is always chat.completion. https://platform.openai.com/docs/api-reference/chat/streaming > object string > The object type, which is always chat.completion.chunk.	2024-11-24 15:03:57 -08:00
Awni Hannun	605c4854f1	Prompt caching in `mlx_lm.server` (#1026 ) * caching in server * nits * fix tests * don't throw if no metal * comments	2024-10-14 10:57:22 -07:00
jamesm131	d812516d3d	Add /v1/models endpoint to mlx_lm.server (#984 ) * Add 'models' endpoint to server * Add test for new 'models' server endpoint * Check hf_cache for mlx models * update tests to check hf_cache for models * simplify test * doc --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-28 07:21:11 -07:00
Khush Gupta	8fa12b0058	Adapters loading (#902 ) * Added functionality to load in adapters through post-requests so you do not need to restart the server * ran pre-commit * nits * fix test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-08-01 16:18:18 -07:00
Chime Ogbuji	1d701a1831	Logprobs info to completion API (#806 ) * Initial implementation * Fix handling of return_step_logits in return * Fixed OpenAI parameter expectations and logprob structure and datatypes * pre-commit black formatting * Remove unused parameter * fix log probs * fix colorize * nits in server * nits in server * Fix top_logprobs structure (a dict) and include tokens in logprobs response * nits * fix types --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-23 10:35:13 -07:00
Karim Elmaaroufi	4bf2eb17f2	Validate server params & fix logit bias bug (#731 ) * Bug fix in logit bias * Add parameter validations * Fix typo * Update docstrings to match MLX styling * Black style + fix a validation bug	2024-04-30 07:27:40 -07:00
Phúc H. Lê Khắc	35206806ac	Create executables for generate, lora, server, merge, convert (#682 ) * feat: create executables mlx_lm.<cmd> * nits in docs --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-04-16 16:08:49 -07:00
Y4hL	ea92f623d6	Prevent llms/mlx_lm from serving the local directory as a webserver (#498 ) * Don't serve local directory BaseHTTPRequestHandler serves the current directory by default. Definitely not intended behaviour. Remove the "do_HEAD" and "do_GET" methods. * Fix typo in method name I assume hanlde_stream was intended to be called handle_stream * Fix outdated typehint load_model returns nn.Module, however fetch_from_hub was not updated to reflect the change * Add some more type hints * Add warnings for using in prod Add a warning to README and runtime, discouraging use in production. The warning is the same as on the python docs for HTTPServer https://docs.python.org/3/library/http.server.html * format * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-02-27 19:40:42 -08:00
Anchen	82f3f31d93	chore(mlx-lm): refactor server.py to utilize generate_step from utils for consistency (#491 ) * chore(mlx-lm): refactor server.py to utilize generate_step from utils for consistency * chore(mlx-lm): update server doc * chore: remove unused generate func	2024-02-27 06:25:24 -08:00
Awni Hannun	8fd953ee2b	Support for slerp merging models (#455 ) * support for slerp merging models * docs * update docs * format'	2024-02-19 20:37:15 -08:00

10 Commits