* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf
* lazy model import in mlx_lm
* change lora loading
* fix olmo lora
* remove a bunch of unused stuff from plamo
* move phixtral to mlx-lm and out of llms/
* Convert HF weights of PLaMo and load it to a plamo model in mlx
* Fix model inference part
* Add bos at the beginning of the prompt
* Fix convert.py to copy tokenizer.model into the converted dir
* Use the required insturction format in generate.py when "--instruct" option is specified
* Change filenames and update existing scripts
* Add README
* Add requirements.txt
* Fix plamo.py to stop generation when EOS appears
* Add quantization to convert.py
* Use mlx>=0.0.9 for mx.core.outer() in PLaMo model
* Update acknowledgements.md
* Fix card text in upload_to_hub()
* Not use prompt template when --instruct is not specified
* Ask if you trust_remote_code for loading tokenizer of PLaMo
* Check the user trusts the remote code when converting
* Remove plamo directory
* Update README
* Add PLaMo model file
* Fix the handling of cache in PLaMo and update README
* Ask if trust_remote_code only when the model is PLaMo
* Remove resolve_trust_remote_code from convert.py and use the latest transformers
* Remove code not to add EOS
* Update README to fix an example not to use noncommercial version of the model
* Remove unused imports
* Remove unnecessary description about the instruct model of PLaMo from README
* format, nits in README
* typo
---------
Co-authored-by: Shunta Saito <shunta@mitmul-mbp.local>
Co-authored-by: Awni Hannun <awni@apple.com>