mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-06-24 01:17:28 +08:00

Author	SHA1	Message	Date
Alex Barron	d72fdeb4ee	MusicGen (#1020 ) * Add MusicGen model * add benchmarks * change to from_pretrained * symlinks * add readme and requirements * fix readme * readme	2024-10-11 10:16:20 -07:00
Awni Hannun	b8a348c1b8	Switch to fast RMS/LN Norm (#603 ) * use nn.RMSNorm, use sdpa, cleanup * bump mlx versions * minor update * use fast layer norm * version bump * update requirement for whisper * update requirement for gguf	2024-03-23 07:13:51 -07:00
Abdul Fatir	e05e502c34	Fix scaling when embeddings are tied (#591 )	2024-03-18 13:41:07 -07:00
Benjamin Anderson	09566c7257	add speculative decoding example for llama (#149 ) * speculative decoding * add sample 0 * spec decode gives same results as regular decode * rebase * use accept reject criteria * switch to t5 * update readme * readme nit * nits * nits * nits --------- Co-authored-by: Benjamin Anderson <benjamin@Benjamins-MBP.lan> Co-authored-by: Awni Hannun <awni@apple.com>	2023-12-28 15:20:43 -08:00
Todsaporn Banjerdkit	7ae445f6c7	feat: add mistral tps (#173 ) * feat: add mistral tps * eval params before timing + format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2023-12-22 07:55:57 -08:00
Juarez Bochi	4c9db80ed2	Add support for byt5 models (#161 ) * Add support for byt5 models * Remove unused import	2023-12-21 08:46:36 -08:00
Awni Hannun	27c0a8c002	Add llms subdir + update README (#145 ) * add llms subdir + update README * nits * use same pre-commit as mlx * update readmes a bit * format	2023-12-20 10:22:25 -08:00
Juarez Bochi	ebbb7083cc	T5: Change default dtype to bfloat16 (#147 ) * T5: Change default to bfloat16 * Add myself to contributors * t5: Change convert.py default to float32	2023-12-19 13:44:36 -08:00
Juarez Bochi	10a7b99e83	Add T5 and Flan-T5 example (#113 ) * Add skeleton * Load all encoder weights * Pass config to all modules, fix ln * Load position bias embeddings * Load decoder weights * Move position biases to attention module * translate pytorch to mx * Fix default prompt * Fix relative_attention_max_distance config * No scaling, no encoder mask * LM head * Decode (broken after 1st token) * Use position bias in all layers * Utils to compare encoder output * Fix layer norm * Fix decoder mask * Use position bias in decoder * Concatenate tokens * Remove prints * Stop on eos * Measure tokens/s * with cache * bug fix with bidirectional only for encoder, add offset to position bias * format * Fix T5.__call__ * Stream output * Add argument to generate float16 npz * Load config from HF to support any model * Uncomment bidirectional param * Add gitignore * Add readme.md for t5 * Fix relative position scale * Fix --encode-only * Run hf_t5 with any model * Add hf generation for comparison * Fix type for attention mask * Increase hf max_length * Rescale output before projecting on vocab * readme updates * nits * Pass ln2 to cross attention * Fix example * Fix attention for 3b model * fp16, abstract tokenizer a bit, format * clamp for low precision * higher clipping, remove non-helpful casts * default to fp32 for now * Adds support for flan-t5 * Update t5 docs on variant support * readme flan * nit --------- Co-authored-by: Awni Hannun <awni@apple.com>	2023-12-18 20:25:34 -08:00

9 Commits