* support for phi-3 4bits quantized gguf weights
* Added link to 4 bits quantized model
* removed some prints
* Added correct comment
* Added correct comment
* removed print
Since last condition already prints warning for when quantization is None
* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf
* Draft of tiny llama from gguf
* Transpose all
* No transposition with new layout
* Read config from gguf
* Create tokenizer from gguf
* move gguf and update to be similar to hf_llm
* change model to HF style + updates to REAMDE
* nits in REAMDE
* nit readme
* only use mlx for metadata
* fix eos/bos tokenizer
* fix tokenization
* quantization runs
* 8-bit works
* tokenizer fix
* bump mlx version
---------
Co-authored-by: Juarez Bochi <juarez.bochi@grammarly.com>
Co-authored-by: Awni Hannun <awni@apple.com>