* Implement normalizing flow Real NVP example
* Add requirements and basic usage to normalizing flow example
* Minor changes to README in normalizing flow example
* Remove trailing commas in function arguments for unified formatting in flows example
* Fix minor typos, add some annotations
* format + nits in README
* readme fix
* mov, minor changes in main, copywright
* remove debug
* fix
* Simplified class structure in distributions; better code re-use in bijectors
* Remove rogue space
* change name again
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Add missing keyword to the decoding options
* Reverting last commit
* Fixing transcribe keyword in benckmark.py
* Add argument name to load_model
This is intended to avoid confusion
* refactor: moving phi2 example into hf_llm
* chore: clean up
* chore: update phi2 model args so it can load args from config
* fix phi2 + nits + readme
* allow any HF repo, update README
* fix bug in llama
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* * Add --local flag for reading models from filesystem and related code for doing so
* Disable uploading to huggingface if --local flag is set
* Remove code related to .bin files and merge fetch_from_local and fetch_from_hub into one function.
* Update llms/hf_llm/convert.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* format / nits
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
* refactor: make the phi2 example can be directly load the model from hf without convert needed
* chore: add super().__init__() for all module, otherwise will cause error in lora
* Add word timestamps and confidence scores
* Create a separate forward_with_cross_qk function
* Move multiple ops from np to mlx, clean comments
* Save alignment_heads
* Cast qk to fp32
* Add test for word-level timestamps and confidence scores
* format + readme
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* refactor: merge deepseek coder example into hf_llm example
* remove deepseek example
* chore: fix format in readme
* chore: remove default rope_scaling dict and use get to access type and factor to avoid key error
* Update llms/hf_llm/models.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* chore: fix lint
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Add option to load customized mlx model
* Add quantization
* Apply reviews
* Separate model conversion and loading
* Update test
* Fix benchmark
* Add notes about conversion
* Improve doc
* feat: add example for deepseek coder
* chore: remove hardcoded rope_scaling_factor
* feat: add quantization support
* chore: update readme
* chore: clean up the rope scalling factor param in create cos sin theta
* feat: add repetition_penalty
* style /consistency changes to ease future integration
* nits in README
* one more typo
---------
Co-authored-by: Awni Hannun <awni@apple.com>