From a65d452ffdbc0095ed92058006c687ce9b2f57fd Mon Sep 17 00:00:00 2001
From: Awni Hannun <awni@apple.com>
Date: Thu, 28 Dec 2023 15:01:00 -0800
Subject: [PATCH] update readme

---
 llms/speculative_decoding/README.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/llms/speculative_decoding/README.md b/llms/speculative_decoding/README.md
index 53b773f7..a0355406 100644
--- a/llms/speculative_decoding/README.md
+++ b/llms/speculative_decoding/README.md
@@ -1,7 +1,7 @@
 # Speculative Decoding
 
 This example implements speculative decoding with the T5 model for text
-generation.[^1] Speculative decoding uses a smaller draft model to propose
+generation.[^1][^2] Speculative decoding uses a smaller draft model to propose
 several tokens, and a larger model to decide which tokens to accept. The
 distribution of the generated text is identical to what the larger model would
 produce on its own, but with far fewer forward passes of the large model since
@@ -23,7 +23,7 @@ T5 11B model with:
 python convert.py --model t5-11b
 ```
 
-And for the draft model, convert the T5 small model with:
+We'll use T5 small for the draft model. Convert it with:
 
 ```
 python convert.py --model t5-small
@@ -59,5 +59,7 @@ speculative decoding is. You can use `--draft` to tune the number of draft
 tokens per model evaluation in order to reduce the number of discarded draft
 tokens.
 
-[^1] See the paper [Fast Inference from Transformers via Speculative
+[^1]: See the paper [Fast Inference from Transformers via Speculative
 Decoding](https://arxiv.org/abs/2211.17192)
+[^1]: For more information on T5 see the [original paper](https://arxiv.org/abs/1910.10683)
+   or the [Hugging Face page](https://huggingface.co/docs/transformers/model_doc/t5).