Commit Graph

168 Commits

Author SHA1 Message Date
Awni Hannun
688795c665 default to fp32 for now 2023-12-18 17:15:49 -08:00
Awni Hannun
05a8464d78 higher clipping, remove non-helpful casts 2023-12-18 14:36:07 -08:00
Awni Hannun
d2732a6478 clamp for low precision 2023-12-18 14:25:58 -08:00
Awni Hannun
fd351850e4 fp16, abstract tokenizer a bit, format 2023-12-18 13:15:02 -08:00
Juarez Bochi
72581e5c1a
Fix attention for 3b model 2023-12-18 15:50:29 -05:00
Juarez Bochi
dbb4d6aea6
Fix example 2023-12-18 15:07:50 -05:00
Juarez Bochi
64e53e8415
Pass ln2 to cross attention 2023-12-18 15:05:18 -05:00
Awni Hannun
e899271275 nits 2023-12-18 11:01:16 -08:00
Awni Hannun
29e642a482 readme updates 2023-12-18 10:58:43 -08:00
Juarez Bochi
36fd88509e
Rescale output before projecting on vocab 2023-12-18 13:43:03 -05:00
Juarez Bochi
511f572b6c
Increase hf max_length 2023-12-18 13:35:44 -05:00
Juarez Bochi
66e1c0f050
Fix type for attention mask 2023-12-18 11:39:17 -05:00
Juarez Bochi
5ae339f6d2
Add hf generation for comparison 2023-12-18 11:35:16 -05:00
Juarez Bochi
305a52dde8
Run hf_t5 with any model 2023-12-18 11:25:14 -05:00
Juarez Bochi
0779417903
Fix --encode-only 2023-12-18 11:19:44 -05:00
Juarez Bochi
83b68a5bdb
Fix relative position scale 2023-12-18 11:13:44 -05:00
Juarez Bochi
9d3ee016c9
Add readme.md for t5 2023-12-18 08:50:36 -05:00
Juarez Bochi
4bc8f49043
Add gitignore 2023-12-18 08:42:45 -05:00
Juarez Bochi
54b82198d0
Uncomment bidirectional param 2023-12-18 08:42:27 -05:00
Juarez Bochi
55f204dd3a
Load config from HF to support any model 2023-12-18 08:42:06 -05:00
Juarez Bochi
b2a3782a96
Add argument to generate float16 npz 2023-12-18 08:21:20 -05:00
Juarez Bochi
09e851499a
Stream output 2023-12-18 08:09:56 -05:00
Juarez Bochi
689eda9937
Fix T5.__call__ 2023-12-18 08:00:01 -05:00
Awni Hannun
34843ddeb2 format 2023-12-17 21:30:28 -08:00
Awni Hannun
c468edc4e3 bug fix with bidirectional only for encoder, add offset to position bias 2023-12-17 21:22:00 -08:00
Awni Hannun
688a6e1e78 with cache 2023-12-17 17:35:53 -08:00
Juarez Bochi
29bfb93455
Measure tokens/s 2023-12-17 10:53:49 -05:00
Juarez Bochi
90d3a15ba2
Stop on eos 2023-12-17 08:59:03 -05:00
Juarez Bochi
61fda57eba
Remove prints 2023-12-17 08:52:54 -05:00
Juarez Bochi
152e85fade
Concatenate tokens 2023-12-17 08:51:16 -05:00
Juarez Bochi
daea1dcddf
Use position bias in decoder 2023-12-17 08:40:10 -05:00
Juarez Bochi
7dcf2b688d
Fix decoder mask 2023-12-17 08:34:21 -05:00
Juarez Bochi
f26e81ccc9
Fix layer norm 2023-12-17 07:47:52 -05:00
Juarez Bochi
4ec2b6eec3
Utils to compare encoder output 2023-12-17 07:20:24 -05:00
Juarez Bochi
7e42349f4c
Use position bias in all layers 2023-12-17 07:19:32 -05:00
Juarez Bochi
203f550ef9
Decode (broken after 1st token) 2023-12-16 14:53:50 -05:00
Juarez Bochi
31da1b0dab
LM head 2023-12-16 14:44:15 -05:00
Juarez Bochi
d12db65eeb
No scaling, no encoder mask 2023-12-16 14:24:13 -05:00
Juarez Bochi
64e7eaccb8
Fix relative_attention_max_distance config 2023-12-16 11:18:17 -05:00
Juarez Bochi
2a8ee32b02
Fix default prompt 2023-12-16 08:17:08 -05:00
Juarez Bochi
392b7a2f98
translate pytorch to mx 2023-12-15 16:51:01 -05:00
Juarez Bochi
330f024d1c
Move position biases to attention module 2023-12-15 11:30:17 -05:00
Juarez Bochi
d0497ddc0b
Load decoder weights 2023-12-15 10:50:04 -05:00
Juarez Bochi
009ed0179c
Load position bias embeddings 2023-12-15 10:16:11 -05:00
Juarez Bochi
62924d8135
Pass config to all modules, fix ln 2023-12-14 15:51:03 -05:00
Juarez Bochi
c0001a94f2
Load all encoder weights 2023-12-14 15:38:41 -05:00
Juarez Bochi
bca5ca4f98
Add skeleton 2023-12-14 15:21:36 -05:00
Awni Hannun
0e88a6afa1
Merge pull request #103 from arpitingle/patch-1
added phi in readme
2023-12-14 10:19:40 -08:00
arpit
5b08da2395
Update README.md 2023-12-14 23:40:50 +05:30
Awni Hannun
92efa32060
Merge pull request #97 from jbarrow/main
Phi-2
2023-12-14 09:21:26 -08:00