Logprobs info to completion API (#806)

* Initial implementation * Fix handling of return_step_logits in return * Fixed OpenAI parameter expectations and logprob structure and datatypes * pre-commit black formatting * Remove unused parameter * fix log probs * fix colorize * nits in server * nits in server * Fix top_logprobs structure (a dict) and include tokens in logprobs response * nits * fix types --------- Co-authored-by: Awni Hannun <awni@apple.com>
2025-12-16 02:08:55 +08:00 · 2024-06-23 13:35:13 -04:00
parent a7598e9456
commit 1d701a1831
3 changed files with 94 additions and 43 deletions
--- a/llms/mlx_lm/SERVER.md
+++ b/llms/mlx_lm/SERVER.md
@@ -17,7 +17,7 @@ mlx_lm.server --model <path_to_model_or_hf_repo>
 For example:

 ```shell
-mlx_lm.server --model mistralai/Mistral-7B-Instruct-v0.1
+mlx_lm.server --model mlx-community/Mistral-7B-Instruct-v0.3-4bit
 ```

 This will start a text generation server on port `8080` of the `localhost`
@@ -73,4 +73,8 @@ curl localhost:8080/v1/chat/completions \
  applying repetition penalty. Defaults to `20`.

 - `logit_bias`: (Optional) A dictionary mapping token IDs to their bias
-  values. Defaults to `None`.
+  values. Defaults to `None`.
+
+- `logprobs`: (Optional) An integer specifying the number of top tokens and
+  corresponding log probabilities to return for each output in the generated
+  sequence. If set, this can be any value between 1 and 10, inclusive.