* deepseekv3
* use upload_large_file instead of deprecated multi comit
* add pipeline generation and example
* comment
* get fp16 working
* use mlx==0.22
* use fast rope
* fix llama
* use fast rope for llama3.1
* requires unreleased mlx
* fix su
* fix deepseek v2
* only one of base or freqs
* nit
* fix
* hard code freqs
* Generate response with optional arguments
* Reference response generation example
* Include transformers and sentencepiece
* Update example to run Mistral-7B-Instruct-v0.3
* Link to generation example
* Style changes from pre-commit
* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf
* update a few examples to use compile
* update mnist
* add compile to vae and rename some stuff for simplicity
* update reqs
* use state in eval
* GCN example with RNG + dropout
* add a bit of prefetching
* refactor(qwen): moving qwen into mlx-lm
* chore: update doc
* chore: fix type hint
* add qwen model support in convert
* chore: fix doc
* chore: only load model in quantize_model
* chore: make the convert script only copy tokenizer files instead of load it and save
* chore: update docstring
* chore: remove unnecessary try catch
* chore: clean up for tokenizer and update transformers 4.37
* nits in README
---------
Co-authored-by: Awni Hannun <awni@apple.com>