mlx-examples/t5/README.md
Juarez Bochi 10a7b99e83
Add T5 and Flan-T5 example (#113)
* Add skeleton

* Load all encoder weights

* Pass config to all modules, fix ln

* Load position bias embeddings

* Load decoder weights

* Move position biases to attention module

* translate pytorch to mx

* Fix default prompt

* Fix relative_attention_max_distance config

* No scaling, no encoder mask

* LM head

* Decode (broken after 1st token)

* Use position bias in all layers

* Utils to compare encoder output

* Fix layer norm

* Fix decoder mask

* Use position bias in decoder

* Concatenate tokens

* Remove prints

* Stop on eos

* Measure tokens/s

* with cache

* bug fix with bidirectional only for encoder, add offset to position bias

* format

* Fix T5.__call__

* Stream output

* Add argument to generate float16 npz

* Load config from HF to support any model

* Uncomment bidirectional param

* Add gitignore

* Add readme.md for t5

* Fix relative position scale

* Fix --encode-only

* Run hf_t5 with any model

* Add hf generation for comparison

* Fix type for attention mask

* Increase hf max_length

* Rescale output before projecting on vocab

* readme updates

* nits

* Pass ln2 to cross attention

* Fix example

* Fix attention for 3b model

* fp16, abstract tokenizer a bit, format

* clamp for low precision

* higher clipping, remove non-helpful casts

* default to fp32 for now

* Adds support for flan-t5

* Update t5 docs on variant support

* readme flan

* nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2023-12-18 20:25:34 -08:00

54 lines
1.4 KiB
Markdown

# T5
The T5 models are encoder-decoder models pre-trained on a mixture of
unsupervised and supervised tasks.[^1] These models work well on a variety of
tasks by prepending task-specific prefixes to the input, e.g.:
`translate English to German: …`, `summarize: ….`, etc.
This example also supports the FLAN-T5 models variants.[^2]
## Setup
Download and convert the model:
```sh
python convert.py --model <model>
```
This will make the `<model>.npz` file which MLX can read.
The `<model>` can be any of the following:
| Model Name | Model Size |
| ---------- | ----------
| t5-small | 60 million |
| t5-base | 220 million |
| t5-large | 770 million |
| t5-3b | 3 billion |
| t5-11b | 11 billion |
The FLAN variants can be specified with `google/flan-t5-small`,
`google/flan-t5-base`, etc. See the [Hugging Face
page](https://huggingface.co/docs/transformers/model_doc/flan-t5) for a
complete list of models.
## Generate
Generate text with:
```sh
python t5.py --model t5-small --prompt "translate English to German: A tasty apple"
```
This should give the output: `Ein leckerer Apfel`
To see a list of options run:
```sh
python t5.py --help
```
[^1]: For more information on T5 see the [original paper](https://arxiv.org/abs/1910.10683)
or the [Hugging Face page](https://huggingface.co/docs/transformers/model_doc/t5).
[^2]: For more information on FLAN-T5 see the [original paper](https://arxiv.org/abs/2210.11416).