mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-24 17:31:18 +08:00

* Add skeleton

* Load all encoder weights

* Pass config to all modules, fix ln

* Load position bias embeddings

* Load decoder weights

* Move position biases to attention module

* translate pytorch to mx

* Fix default prompt

* Fix relative_attention_max_distance config

* No scaling, no encoder mask

* LM head

* Decode (broken after 1st token)

* Use position bias in all layers

* Utils to compare encoder output

* Fix layer norm

* Fix decoder mask

* Use position bias in decoder

* Concatenate tokens

* Remove prints

* Stop on eos

* Measure tokens/s

* with cache

* bug fix with bidirectional only for encoder, add offset to position bias

* format

* Fix T5.__call__

* Stream output

* Add argument to generate float16 npz

* Load config from HF to support any model

* Uncomment bidirectional param

* Add gitignore

* Add readme.md for t5

* Fix relative position scale

* Fix --encode-only

* Run hf_t5 with any model

* Add hf generation for comparison

* Fix type for attention mask

* Increase hf max_length

* Rescale output before projecting on vocab

* readme updates

* nits

* Pass ln2 to cross attention

* Fix example

* Fix attention for 3b model

* fp16, abstract tokenizer a bit, format

* clamp for low precision

* higher clipping, remove non-helpful casts

* default to fp32 for now

* Adds support for flan-t5

* Update t5 docs on variant support

* readme flan

* nit

---------

Co-authored-by: Awni Hannun <awni@apple.com>

2023-12-18 20:25:34 -08:00

1.4 KiB

Raw Blame History

T5

The T5 models are encoder-decoder models pre-trained on a mixture of unsupervised and supervised tasks.¹ These models work well on a variety of tasks by prepending task-specific prefixes to the input, e.g.: translate English to German: …, summarize: …., etc.

This example also supports the FLAN-T5 models variants.²

Setup

Download and convert the model:

python convert.py --model <model>

This will make the <model>.npz file which MLX can read.

The <model> can be any of the following:

Model Name	Model Size
t5-small	60 million
t5-base	220 million
t5-large	770 million
t5-3b	3 billion
t5-11b	11 billion

The FLAN variants can be specified with google/flan-t5-small, google/flan-t5-base, etc. See the Hugging Face page for a complete list of models.

Generate

Generate text with:

python t5.py --model t5-small --prompt "translate English to German: A tasty apple"

This should give the output: Ein leckerer Apfel

To see a list of options run:

python t5.py --help

For more information on T5 see the original paper or the Hugging Face page. ↩︎
For more information on FLAN-T5 see the original paper. ↩︎

1.4 KiB Raw Blame History

T5

Setup

Generate

1.4 KiB

Raw Blame History