Add mode parameter for quantization (#2499)

* add mode parameter for quantization

* mxfp4 quantize/dequantize + start of optional biases

* mxfp4 works

* speedup

* cpu mxfp4

* fix

* fix test tol

* fix

* refactor

* add quant mode enum
This commit is contained in:
Awni Hannun
2025-08-28 06:45:26 -07:00
committed by GitHub
parent 7ef8a6f2d5
commit 70560b6bd5
28 changed files with 3635 additions and 757 deletions

View File

@@ -2996,7 +2996,10 @@ TEST_CASE("test quantize dequantize") {
for (int i = 2; i <= 8; i *= 2) {
int el_per_int = 32 / i;
auto [x_q, scales, biases] = quantize(x, 128, i);
auto res = quantize(x, 128, i);
auto x_q = res[0];
auto scales = res[1];
auto biases = res[2];
CHECK_EQ(x_q.shape(), Shape{128, 512 / el_per_int});
CHECK_EQ(scales.shape(), Shape{128, 4});
CHECK_EQ(biases.shape(), Shape{128, 4});