mlx-examples/t5/README.md

54 lines
1.4 KiB
Markdown
Raw Normal View History

2023-12-18 21:50:36 +08:00
# T5
2023-12-19 02:58:43 +08:00
The T5 models are encoder-decoder models pre-trained on a mixture of
unsupervised and supervised tasks.[^1] These models work well on a variety of
tasks by prepending task-specific prefixes to the input, e.g.:
`translate English to German: …`, `summarize: ….`, etc.
2023-12-18 21:50:36 +08:00
2023-12-19 12:18:42 +08:00
This example also supports the FLAN-T5 models variants.[^2]
2023-12-18 21:50:36 +08:00
## Setup
Download and convert the model:
```sh
2023-12-19 02:58:43 +08:00
python convert.py --model <model>
2023-12-18 21:50:36 +08:00
```
2023-12-19 02:58:43 +08:00
This will make the `<model>.npz` file which MLX can read.
The `<model>` can be any of the following:
| Model Name | Model Size |
| ---------- | ----------
| t5-small | 60 million |
| t5-base | 220 million |
| t5-large | 770 million |
| t5-3b | 3 billion |
| t5-11b | 11 billion |
2023-12-18 21:50:36 +08:00
2023-12-19 12:23:09 +08:00
The FLAN variants can be specified with `google/flan-t5-small`,
2023-12-19 12:18:42 +08:00
`google/flan-t5-base`, etc. See the [Hugging Face
page](https://huggingface.co/docs/transformers/model_doc/flan-t5) for a
complete list of models.
2023-12-19 11:59:36 +08:00
2023-12-18 21:50:36 +08:00
## Generate
2023-12-19 03:01:16 +08:00
Generate text with:
2023-12-18 21:50:36 +08:00
```sh
python t5.py --model t5-small --prompt "translate English to German: A tasty apple"
```
2023-12-19 04:07:50 +08:00
This should give the output: `Ein leckerer Apfel`
2023-12-18 21:50:36 +08:00
To see a list of options run:
```sh
python t5.py --help
```
2023-12-19 02:58:43 +08:00
[^1]: For more information on T5 see the [original paper](https://arxiv.org/abs/1910.10683)
or the [Hugging Face page](https://huggingface.co/docs/transformers/model_doc/t5).
2023-12-19 12:18:42 +08:00
[^2]: For more information on FLAN-T5 see the [original paper](https://arxiv.org/abs/2210.11416).