Logprobs info to completion API (#806)

* Initial implementation

* Fix handling of return_step_logits in return

* Fixed OpenAI parameter expectations and logprob structure and datatypes

* pre-commit black formatting

* Remove unused parameter

* fix log probs

* fix colorize

* nits in server

* nits in server

* Fix top_logprobs structure (a dict) and include tokens in logprobs response

* nits

* fix types

---------

Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
Chime Ogbuji
2024-06-23 13:35:13 -04:00
committed by GitHub
parent a7598e9456
commit 1d701a1831
3 changed files with 94 additions and 43 deletions

View File

@@ -17,7 +17,7 @@ mlx_lm.server --model <path_to_model_or_hf_repo>
For example:
```shell
mlx_lm.server --model mistralai/Mistral-7B-Instruct-v0.1
mlx_lm.server --model mlx-community/Mistral-7B-Instruct-v0.3-4bit
```
This will start a text generation server on port `8080` of the `localhost`
@@ -73,4 +73,8 @@ curl localhost:8080/v1/chat/completions \
applying repetition penalty. Defaults to `20`.
- `logit_bias`: (Optional) A dictionary mapping token IDs to their bias
values. Defaults to `None`.
values. Defaults to `None`.
- `logprobs`: (Optional) An integer specifying the number of top tokens and
corresponding log probabilities to return for each output in the generated
sequence. If set, this can be any value between 1 and 10, inclusive.