Start to cleanup/unify accelerate and common back-ends (Part 1/N) (#1777)

* start to cleanup/unify accelerate and common back-ends * more progress * simplify * add half type and allow infs in simd exp * unify softmax + quantized, more dispatches to simd quantized mm * add sin/cos, use simd in vector-scalar ops * faster CPU vectorize quant * faster erf/erfinv
2025-12-14 09:07:12 +08:00 · 2025-01-29 14:34:49 -08:00
parent 7064fed1b1
commit 4758c8baa1
47 changed files with 1920 additions and 2640 deletions
--- a/python/tests/test_quantized.py
+++ b/python/tests/test_quantized.py
@@ -207,8 +207,8 @@ class TestQuantized(mlx_tests.MLXTestCase):
            with self.subTest(shape=(B, M, N), group_size=group_size, bits=bits):
                x_shape = (1, N) if B == 0 else (B, 1, N)
                w_shape = (N, M) if B == 0 else (B, N, M)
-                x = mx.random.normal(shape=x_shape, key=k1)
-                w = mx.random.normal(shape=w_shape, key=k2)
+                x = 1e-1 * mx.random.normal(shape=x_shape, key=k1)
+                w = 1e-1 * mx.random.normal(shape=w_shape, key=k2)
                w_q, scales, biases = mx.quantize(w, group_size, bits)
                w_hat = mx.dequantize(w_q, scales, biases, group_size, bits)
                y_q = mx.quantized_matmul(