3 and 6 bit quantization (#1613)

* Support 3 and 6 bit quantization
This commit is contained in:
Alex Barron
2024-11-22 10:22:13 -08:00
committed by GitHub
parent 0c5eea226b
commit c79f6a4a8c
12 changed files with 633 additions and 419 deletions

View File

@@ -12,5 +12,4 @@ Fast
layer_norm
rope
scaled_dot_product_attention
affine_quantize
metal_kernel