chore(mlx-lm): refactor server.py to utilize generate_step from utils for consistency (#491)

* chore(mlx-lm): refactor server.py to utilize generate_step from utils for consistency

* chore(mlx-lm): update server doc

* chore: remove unused generate func
This commit is contained in:
Anchen
2024-02-28 01:25:24 +11:00
committed by GitHub
parent 19a21bfce4
commit 82f3f31d93
2 changed files with 35 additions and 55 deletions

View File

@@ -61,3 +61,5 @@ curl localhost:8080/v1/chat/completions \
- `top_p`: (Optional) A float specifying the nucleus sampling parameter.
Defaults to `1.0`.
- `repetition_penalty`: (Optional) Applies a penalty to repeated tokens. Defaults to `1.0`.
- `repetition_context_size`: (Optional) The size of the context window for applying repetition penalty. Defaults to `20`.