Prevent llms/mlx_lm from serving the local directory as a webserver (#498)

* Don't serve local directory BaseHTTPRequestHandler serves the current directory by default. Definitely not intended behaviour. Remove the "do_HEAD" and "do_GET" methods. * Fix typo in method name I assume hanlde_stream was intended to be called handle_stream * Fix outdated typehint load_model returns nn.Module, however fetch_from_hub was not updated to reflect the change * Add some more type hints * Add warnings for using in prod Add a warning to README and runtime, discouraging use in production. The warning is the same as on the python docs for HTTPServer https://docs.python.org/3/library/http.server.html * format * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>
2025-12-16 02:08:55 +08:00 · 2024-02-28 05:40:42 +02:00
parent 676e574eff
commit ea92f623d6
5 changed files with 32 additions and 9 deletions
--- a/llms/mlx_lm/SERVER.md
+++ b/llms/mlx_lm/SERVER.md
@@ -4,6 +4,10 @@ You use `mlx-lm` to make an HTTP API for generating text with any supported
 model. The HTTP API is intended to be similar to the [OpenAI chat
 API](https://platform.openai.com/docs/api-reference).

+> [!NOTE]  
+> The MLX LM server is not recommended for production as it only implements
+> basic security checks.
+
 Start the server with: 

 ```shell
@@ -61,5 +65,9 @@ curl localhost:8080/v1/chat/completions \

 - `top_p`: (Optional) A float specifying the nucleus sampling parameter.
  Defaults to `1.0`.
- `repetition_penalty`: (Optional) Applies a penalty to repeated tokens. Defaults to `1.0`.
- `repetition_context_size`: (Optional) The size of the context window for applying repetition penalty. Defaults to `20`.
+
+- `repetition_penalty`: (Optional) Applies a penalty to repeated tokens.
+  Defaults to `1.0`.
+
+- `repetition_context_size`: (Optional) The size of the context window for
+  applying repetition penalty. Defaults to `20`.