Awni Hannun
4758c8baa1
Start to cleanup/unify accelerate and common back-ends (Part 1/N) ( #1777 )
...
* start to cleanup/unify accelerate and common back-ends
* more progress
* simplify
* add half type and allow infs in simd exp
* unify softmax + quantized, more dispatches to simd quantized mm
* add sin/cos, use simd in vector-scalar ops
* faster CPU vectorize quant
* faster erf/erfinv
2025-01-29 14:34:49 -08:00
Alex Barron
c79f6a4a8c
3 and 6 bit quantization ( #1613 )
...
* Support 3 and 6 bit quantization
2024-11-22 10:22:13 -08:00
Awni Hannun
dfa0b9aab4
Cpu fast quantize ( #1578 )
...
* cpu quantize
* fix
2024-11-08 20:10:39 -08:00
Alex Barron
d15fa13daf
Batched Quantized Matmul + Fast Small QMV ( #1503 )
...
* add fast qmv for small dims
* fix test
* batched cpu
* add batched template param
* refactor metal quantized.cpp
2024-10-21 16:23:17 -07:00
Awni Hannun
d568c7ee36
Rename block sparse ( #1149 )
...
* block_sparse_mm to gather_mm
* rename
* nit
* nit
2024-05-22 07:48:34 -07:00
Angelos Katharopoulos
e78a6518fa
Block sparse qmm ( #1124 )
2024-05-16 15:24:14 -07:00
Angelos Katharopoulos
0de5988f92
Custom VJP and checkpointing ( #541 )
...
* Implement custom_vjp and checkpointing
* Add a dependency management primitive
* Change the eval order to deep branches first
* Add graph depth tracking to the array
2024-01-30 16:04:45 -08:00
Awni Hannun
7a34e46677
Quantize with groups of 32 ( #511 )
...
* allow quantize with group sizes of 32
* missing cpu dispatch
* remove print
* Fix qvm for group_size 32
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-01-21 06:19:05 -08:00
Angelos Katharopoulos
e7f5059fe4
Support for quantized matmul with w and w^T ( #349 )
...
* Add the metal qvm implementation
* Add qmm_n
* Add gradient wrt to input for quantized_matmul
2024-01-03 14:22:36 -08:00
Angelos Katharopoulos
b3916cbf2b
Improve names of quantization arguments ( #235 )
...
* Change the default quantization group_size to 64
* Rename groups to group_size and width to bits
2023-12-20 16:53:53 -08:00
Angelos Katharopoulos
dfa9f4bc58
An initial quantized matmul implementation ( #205 )
...
* Add quantized matvec
* Add quantized matrix matrix with 2nd matrix transposed
* Add quantized matmul tests
* Add a slow cpu quantized matmul
* Add a slightly faster vectorized cpu version
2023-12-18 23:18:57 -08:00