Awni Hannun
2146bcd7ee
Quantize embedding / Update quantize API ( #680 )
...
* more async eval
* quantize embedding / update quantize api
* more updates for quantize
* update for quantize embeddings
* update sd quant API
* update sdxl quants
* error for datasets < batch_size
* async
* fix config loading
* fix quant
* fix tests
* fix req
* remove lm head if tie weights is true
* fix test
2024-04-18 18:16:10 -07:00
dmdaksh
7d7e236061
- Removed unused Python imports ( #683 )
...
- bert/model.py:10: tree_unflatten
- bert/model.py:2: dataclass
- bert/model.py:8: numpy
- cifar/resnet.py:6: Any
- clip/model.py:15: tree_flatten
- clip/model.py:9: Union
- gcn/main.py:8: download_cora
- gcn/main.py:9: cross_entropy
- llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten
- llms/gguf_llm/models.py:9: numpy
- llms/mixtral/mixtral.py:12: tree_map
- llms/mlx_lm/models/dbrx.py:2: Dict, Union
- llms/mlx_lm/tuner/trainer.py:5: partial
- llms/speculative_decoding/decoder.py:1: dataclass, field
- llms/speculative_decoding/decoder.py:2: Optional
- llms/speculative_decoding/decoder.py:5: mlx.nn
- llms/speculative_decoding/decoder.py:6: numpy
- llms/speculative_decoding/main.py:2: glob
- llms/speculative_decoding/main.py:3: json
- llms/speculative_decoding/main.py:5: Path
- llms/speculative_decoding/main.py:8: mlx.nn
- llms/speculative_decoding/model.py:6: tree_unflatten
- llms/speculative_decoding/model.py:7: AutoTokenizer
- llms/tests/test_lora.py:13: yaml_loader
- lora/lora.py:14: tree_unflatten
- lora/models.py:11: numpy
- lora/models.py:3: glob
- speechcommands/kwt.py:1: Any
- speechcommands/main.py:7: mlx.data
- stable_diffusion/stable_diffusion/model_io.py:4: partial
- whisper/benchmark.py:5: sys
- whisper/test.py:5: subprocess
- whisper/whisper/audio.py:6: Optional
- whisper/whisper/decoding.py:8: mlx.nn
2024-04-16 07:50:32 -07:00
Awni Hannun
b8a348c1b8
Switch to fast RMS/LN Norm ( #603 )
...
* use nn.RMSNorm, use sdpa, cleanup
* bump mlx versions
* minor update
* use fast layer norm
* version bump
* update requirement for whisper
* update requirement for gguf
2024-03-23 07:13:51 -07:00
Angelos Katharopoulos
f71e965d57
Change gqa to use repeat instead of concatenate ( #443 )
2024-02-14 17:40:11 -08:00
Juarez Bochi
f5b80c95fb
Example reading directly from gguf file ( #222 )
...
* Draft of tiny llama from gguf
* Transpose all
* No transposition with new layout
* Read config from gguf
* Create tokenizer from gguf
* move gguf and update to be similar to hf_llm
* change model to HF style + updates to REAMDE
* nits in REAMDE
* nit readme
* only use mlx for metadata
* fix eos/bos tokenizer
* fix tokenization
* quantization runs
* 8-bit works
* tokenizer fix
* bump mlx version
---------
Co-authored-by: Juarez Bochi <juarez.bochi@grammarly.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-23 15:41:54 -08:00