Fix bug in upload + docs nit (#981)

* fix bug in upload + docs nit * nit
2025-08-09 02:16:37 +08:00 · 2024-09-07 14:46:57 -07:00 · 2024-09-07 14:46:57 -07:00 · 6c2369e4b9
commit 6c2369e4b9
parent c3e3411756
2 changed files with 8 additions and 24 deletions
--- a/llms/mlx_lm/LORA.md
+++ b/llms/mlx_lm/LORA.md
@ -166,44 +166,28 @@ Currently, `*.jsonl` files support three data formats: `chat`,
 `chat`:

 ```jsonl
-{
-  "messages": [
-    {
-      "role": "system",
-      "content": "You are a helpful assistant."
-    },
-    {
-      "role": "user",
-      "content": "Hello."
-    },
-    {
-      "role": "assistant",
-      "content": "How can I assistant you today."
-    }
-  ]
-}
+{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello."}, {"role": "assistant", "content": "How can I assistant you today."}]}
 ```

 `completions`:

 ```jsonl
-{
-  "prompt": "What is the capital of France?",
-  "completion": "Paris."
-}
+{"prompt": "What is the capital of France?", "completion": "Paris."}
 ```

 `text`:

 ```jsonl
-{
-  "text": "This is an example for the model."
-}
+{"text": "This is an example for the model."}
 ```

 Note, the format is automatically determined by the dataset. Note also, keys in
 each line not expected by the loader will be ignored.

+> [!NOTE]
+> Each example in the datasets must be on a single line. Do not put more than
+> one example per line and do not split an example accross multiple lines.
+
 ### Hugging Face Datasets

 To use Hugging Face datasets, first install the `datasets` package:
--- a/llms/mlx_lm/utils.py
+++ b/llms/mlx_lm/utils.py
@ -581,7 +581,7 @@ def upload_to_hub(path: str, upload_repo: str, hf_path: str):
        prompt="hello"

        if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
-            messages = [{"role": "user", "content": prompt}]
+            messages = [{{"role": "user", "content": prompt}}]
            prompt = tokenizer.apply_chat_template(
                messages, tokenize=False, add_generation_prompt=True
            )