Fix object property value in mlx_lm.server chat completions response to match OpenAI spec

These were "chat.completions" and "chat.completions.chunk" but should be "chat.completion" and "chat.completion.chunk" for compatibility with clients expecting an OpenAI API. In particular, this solves a problem in which aider 0.64.1 reports hitting a token limit on any completion request, no matter how small, despite apparently correct counts in the usage property. Refer to: https://platform.openai.com/docs/api-reference/chat/object > object string > The object type, which is always chat.completion. https://platform.openai.com/docs/api-reference/chat/streaming > object string > The object type, which is always chat.completion.chunk.
2025-08-29 18:26:37 +08:00 · 2024-11-24 14:19:12 -08:00 · 2024-11-24 14:19:12 -08:00 · ec494a97ec
commit ec494a97ec
parent 0f135396ae
2 changed files with 2 additions and 4 deletions
--- a/llms/mlx_lm/SERVER.md
+++ b/llms/mlx_lm/SERVER.md
@ -92,7 +92,7 @@ curl localhost:8080/v1/chat/completions \

 - `system_fingerprint`: A unique identifier for the system.

- `object`: Any of "chat.completions", "chat.completions.chunk" (for
+- `object`: Any of "chat.completion", "chat.completion.chunk" (for
  streaming), or "text.completion".

 - `model`: The model repo or path (e.g. `"mlx-community/Llama-3.2-3B-Instruct-4bit"`).
--- a/llms/mlx_lm/server.py
+++ b/llms/mlx_lm/server.py
@ -589,9 +589,7 @@ class APIHandler(BaseHTTPRequestHandler):

        # Determine response type
        self.request_id = f"chatcmpl-{uuid.uuid4()}"
-        self.object_type = (
-            "chat.completions.chunk" if self.stream else "chat.completions"
-        )
+        self.object_type = "chat.completion.chunk" if self.stream else "chat.completion"
        if (
            hasattr(self.tokenizer, "apply_chat_template")
            and self.tokenizer.chat_template