fix bug in upload + docs nit

This commit is contained in:
Awni Hannun
2024-09-07 08:23:49 -07:00
parent c3e3411756
commit 11a2daebf3
2 changed files with 8 additions and 24 deletions

View File

@@ -166,44 +166,28 @@ Currently, `*.jsonl` files support three data formats: `chat`,
`chat`: `chat`:
```jsonl ```jsonl
{ {"messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello." }, { "role": "assistant", "content": "How can I assistant you today." } ] }
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello."
},
{
"role": "assistant",
"content": "How can I assistant you today."
}
]
}
``` ```
`completions`: `completions`:
```jsonl ```jsonl
{ {"prompt": "What is the capital of France?", "completion": "Paris."}
"prompt": "What is the capital of France?",
"completion": "Paris."
}
``` ```
`text`: `text`:
```jsonl ```jsonl
{ {"text": "This is an example for the model."}
"text": "This is an example for the model."
}
``` ```
Note, the format is automatically determined by the dataset. Note also, keys in Note, the format is automatically determined by the dataset. Note also, keys in
each line not expected by the loader will be ignored. each line not expected by the loader will be ignored.
> [!NOTE]
> Each example in the datasets must be on a single line. Do not put more than
> one example per line and do not split an example accross multiple lines.
### Hugging Face Datasets ### Hugging Face Datasets
To use Hugging Face datasets, first install the `datasets` package: To use Hugging Face datasets, first install the `datasets` package:

View File

@@ -581,7 +581,7 @@ def upload_to_hub(path: str, upload_repo: str, hf_path: str):
prompt="hello" prompt="hello"
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None: if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}] messages = [{{"role": "user", "content": prompt}}]
prompt = tokenizer.apply_chat_template( prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True messages, tokenize=False, add_generation_prompt=True
) )