mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-09-01 04:14:38 +08:00
Prevent llms/mlx_lm from serving the local directory as a webserver (#498)
* Don't serve local directory BaseHTTPRequestHandler serves the current directory by default. Definitely not intended behaviour. Remove the "do_HEAD" and "do_GET" methods. * Fix typo in method name I assume hanlde_stream was intended to be called handle_stream * Fix outdated typehint load_model returns nn.Module, however fetch_from_hub was not updated to reflect the change * Add some more type hints * Add warnings for using in prod Add a warning to README and runtime, discouraging use in production. The warning is the same as on the python docs for HTTPServer https://docs.python.org/3/library/http.server.html * format * nits --------- Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
@@ -4,6 +4,10 @@ You use `mlx-lm` to make an HTTP API for generating text with any supported
|
||||
model. The HTTP API is intended to be similar to the [OpenAI chat
|
||||
API](https://platform.openai.com/docs/api-reference).
|
||||
|
||||
> [!NOTE]
|
||||
> The MLX LM server is not recommended for production as it only implements
|
||||
> basic security checks.
|
||||
|
||||
Start the server with:
|
||||
|
||||
```shell
|
||||
@@ -61,5 +65,9 @@ curl localhost:8080/v1/chat/completions \
|
||||
|
||||
- `top_p`: (Optional) A float specifying the nucleus sampling parameter.
|
||||
Defaults to `1.0`.
|
||||
- `repetition_penalty`: (Optional) Applies a penalty to repeated tokens. Defaults to `1.0`.
|
||||
- `repetition_context_size`: (Optional) The size of the context window for applying repetition penalty. Defaults to `20`.
|
||||
|
||||
- `repetition_penalty`: (Optional) Applies a penalty to repeated tokens.
|
||||
Defaults to `1.0`.
|
||||
|
||||
- `repetition_context_size`: (Optional) The size of the context window for
|
||||
applying repetition penalty. Defaults to `20`.
|
||||
|
Reference in New Issue
Block a user