Arkar Min Aung
f4789ab8b9
feat: Add SVD primitive GPU evaluation support
...
- Implement SVD::eval_gpu in Metal primitives backend
- Add proper float32/float64 type dispatch
- Include clear error messages for unsupported double precision
- Connect SVD primitive to Metal backend implementation
- Enable GPU path for SVD operations in MLX
2025-06-14 21:23:04 +10:00
Arkar Min Aung
54125e5ff5
feat: Implement Metal SVD backend with CPU fallback
...
- Add comprehensive SVD implementation in mlx/backend/metal/svd.cpp
- Include input validation for dimensions, data types, and edge cases
- Implement CPU fallback for immediate functionality
- Add proper error handling for unsupported float64 operations
- Support both singular values only and full SVD decomposition
- Prepare infrastructure for future Metal kernel integration
2025-06-14 21:22:49 +10:00
Arkar Min Aung
b7838461c1
feat: Add Metal SVD kernel infrastructure
...
- Add svd.h header with kernel declarations
- Add svd.metal with placeholder Metal compute shaders
- Define SVD algorithm parameters and data structures
- Prepare foundation for Metal GPU-accelerated SVD implementation
2025-06-14 21:22:34 +10:00
Arkar Min Aung
6d01528e90
feat: Add benchmarking and documentation updates for Metal SVD
...
- Add comprehensive SVD benchmark script (benchmarks/python/svd_benchmark.py):
* Performance comparison between CPU and GPU implementations
* Batch processing benchmarks
* Correctness verification tests
* Detailed timing and speedup analysis
- Update linalg documentation to mention Metal GPU acceleration
- Add implementation summary document for development reference
This addresses CONTRIBUTING.md requirements:
- Benchmarks for efficiency impact measurement (point 3)
- Documentation updates for API changes (point 4)
- Comprehensive testing coverage (point 2)
2025-06-14 17:28:19 +10:00
Arkar Min Aung
5875252f87
feat: Add comprehensive testing and documentation for Metal SVD
...
- Add comprehensive test suite (test_metal_svd.cpp):
* Basic functionality tests
* Input validation tests
* Various matrix sizes and batch processing
* Reconstruction accuracy verification
* Orthogonality property checks
* Special matrices (identity, zero, diagonal)
* Performance characteristic tests
- Add detailed implementation documentation:
* Algorithm description and complexity analysis
* Usage examples and API documentation
* Performance benchmarks and characteristics
* Implementation details and file structure
* Error handling and limitations
* Contributing guidelines
- Enhance error handling and robustness:
* Improved input validation with detailed error messages
* Memory allocation error handling
* NaN/Inf input detection
* Performance logging for large matrices
- Integrate tests into CMake build system
This completes the Metal SVD implementation with production-ready
testing and documentation.
2025-06-14 17:05:10 +10:00
Arkar Min Aung
c09f1faf9a
feat: Add convergence checking and algorithm improvements
...
- Add svd_check_convergence kernel to monitor off-diagonal norm
- Implement proper convergence checking in Jacobi iterations
- Add algorithm selection heuristics based on matrix properties
- Improve singular vector computation with proper rotation application
- Add adaptive parameter selection (tolerance, max_iterations)
- Enhance error handling and workspace management
Key improvements:
* Convergence checking every 5 iterations to reduce overhead
* Matrix-size-dependent parameter tuning
* Better memory management with convergence tracking
* More accurate singular vector computation
This significantly improves the robustness and efficiency of the
Metal SVD implementation.
2025-06-14 17:05:10 +10:00
Arkar Min Aung
7ec92466df
feat: Implement basic one-sided Jacobi SVD algorithm in Metal
...
- Add complete Metal kernel implementations for SVD computation:
* svd_preprocess: Computes A^T * A matrix
* svd_jacobi_iteration: Performs Jacobi rotations to diagonalize
* svd_extract_singular_values: Extracts singular values from diagonal
* svd_compute_vectors: Computes singular vectors (basic implementation)
- Update host-side implementation to orchestrate kernel execution:
* Allocate workspace for A^T * A and rotation storage
* Execute preprocessing, iteration, and extraction phases
* Handle both singular values only and full SVD modes
- Add proper template instantiations for float and double precision
This provides a working Metal SVD implementation using the Jacobi method.
Performance optimizations and convergence checking will follow.
2025-06-14 17:05:10 +10:00
Arkar Min Aung
c67eea520e
Merge branch 'ml-explore:main' into feature/metal-svd-base
2025-06-14 16:53:43 +10:00
Awni Hannun
a6d780154f
fix cuda gemm for bf16 ( #2288 )
2025-06-13 22:10:46 -07:00
Awni Hannun
6871e2eeb7
fix cuda jit ( #2287 )
2025-06-13 19:21:46 -07:00
Awni Hannun
8402a2acf4
Fix complex power and print ( #2286 )
...
* fix complex power and print
* fix complex matmul shape
2025-06-13 11:13:00 -07:00
Jagrit Digani
fddb6933e1
Collection of refactors ( #2274 )
...
* Refactor gemv into a function
* Refactor splitk step 1
* Refactor split k axpby
* Rearrange steel_gemm_regular
* Redirect steel_gemm_regular
* Add axpby routing to steel_matmul_regular
* Refactor AddMM step 1
* Redirect steel_gemm
* Update addmm
* Comments and format
* Some cleanup
* Add architecture gen to device
* Update no copy condition in normalization to account for axis size 1
2025-06-13 10:44:56 -07:00
Arkar Min Aung
a71a9e0ddd
feat: Add Metal SVD infrastructure and parameter structures
...
- Add SVDParams, JacobiRotation, and SVDConvergenceInfo structures
- Create placeholder Metal kernel declarations for SVD operations
- Add SVD kernel compilation to CMake build system
- Update SVD::eval_gpu to dispatch to Metal implementation
- Add basic input validation and error handling
- Include placeholder kernel implementation for compilation
This establishes the foundation for Metal SVD implementation.
Actual algorithm implementation will follow in subsequent commits.
2025-06-13 23:28:52 +10:00
Cheng
c8b4787e4e
CUDA backend: indexing ops ( #2277 )
2025-06-12 21:44:19 -07:00
Awni Hannun
2188199ff8
[CUDA] ternary with select op ( #2283 )
...
* cuda ternary with select op
* comment + fix
* fix
2025-06-12 20:24:43 -07:00
Awni Hannun
aa07429bad
Fix cuda build ( #2284 )
2025-06-12 17:48:05 -07:00
Awni Hannun
918761a25a
[CUDA] RMSNorm and VJP ( #2280 )
...
* rms norm start
* nit
2025-06-12 17:09:49 -07:00
Cheng
a4fc671d3e
CUDA backend: compile ( #2276 )
...
* CUDA backend: compile
* Rename kernels/ to device/
2025-06-12 17:08:39 -07:00
Awni Hannun
f5f65ef48c
Make sliceUpdate general ( #2282 )
...
* Make sliceUpdate general
* fix
2025-06-12 16:48:54 -07:00
Cheng
c2dd81a8aa
Fix warnings from latest CUDA toolkit ( #2275 )
2025-06-12 06:03:01 -07:00
Cheng
d7e680ffe4
CUDA backend: layernorm ( #2271 )
2025-06-11 15:48:32 -07:00
Cheng
c371baf53a
CUDA backend: softmax ( #2272 )
2025-06-11 13:55:22 -07:00
Cheng
ccf78f566c
CUDA backend: argreduce ( #2270 )
2025-06-11 13:26:17 -07:00
Cheng
c9fa68664a
CUDA backend: reduce ( #2269 )
2025-06-11 11:22:25 -07:00
Awni Hannun
c35f4d089a
start cuda circle config ( #2256 )
...
* rebase
* fix metal kernel linking issue on cuda
* start cuda circle config
2025-06-10 21:19:47 -07:00
Angelos Katharopoulos
8590c0941e
Add load_safe to the general conv loaders ( #2258 )
2025-06-10 20:58:16 -07:00
Cheng
095163b8d1
Fix building cpp benchmarks on Linux ( #2268 )
2025-06-10 17:10:24 -07:00
Cheng
99c33d011d
rebase + nit ( #2260 )
...
Co-authored-by: Awni Hannun <awni@apple.com>
2025-06-10 10:51:51 -07:00
Awni Hannun
62fecf3e13
fix conv export ( #2265 )
2025-06-10 09:34:01 -07:00
Cheng
7c4eb5d03e
CUDA backend: random ( #2261 )
2025-06-10 08:59:56 -07:00
Cheng
bae9a6b404
CUDA backend: sort ( #2262 )
...
Co-authored-by: Awni Hannun <awni@apple.com>
2025-06-10 08:59:47 -07:00
Christopher Fleetwood
004c1d8ef2
Report number of missing parameters ( #2264 )
...
* chore: inform
* chore: format
---------
Co-authored-by: FL33TW00D <FL33TW00D@users.noreply.github.com>
2025-06-10 06:37:50 -07:00
Cheng
7ebb2e0193
CUDA backend: binary ops ( #2259 )
2025-06-10 06:37:40 -07:00
Awni Hannun
9ce77798b1
fix export to work with gather/scatter axis ( #2263 )
2025-06-09 20:37:27 -07:00
Cheng
f8bad60609
CUDA backend: unary ops ( #2158 )
2025-06-09 06:45:08 -07:00
Emmanuel Ferdman
5866b3857b
Refactor the lu test ( #2250 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-06-07 06:12:08 -07:00
Awni Hannun
1ca616844b
Fix unintuitive metal kernel caching ( #2242 )
...
* Fix unintuitive metal kernel caching
* alternative solution
2025-06-06 20:08:15 -07:00
Angelos Katharopoulos
2e8cf0b450
Change layernorms to two pass algorithm ( #2246 )
2025-06-06 13:34:56 -07:00
Cheng
24f89173d1
CUDA backend: matmul ( #2241 )
2025-06-06 12:24:04 -07:00
Awni Hannun
c6a20b427a
Improve metal elementwise kernels ( #2247 )
...
* improve metal elementwise kernels
* compile and copy
* fix jit
2025-06-06 11:37:40 -07:00
Awni Hannun
a5ac9244c4
fix linux linking error ( #2248 )
2025-06-06 10:41:51 -07:00
Awni Hannun
c763fe1be0
default strict mode for module update and update_modules ( #2239 )
2025-06-05 15:27:02 -07:00
Cheng
52dc8c8cd5
Add profiler annotations in common primitives for CUDA backend ( #2244 )
2025-06-04 19:55:12 -07:00
Angelos Katharopoulos
aede70e81d
Perf regression fix ( #2243 )
2025-06-03 17:55:12 -07:00
Cheng
85a8beb5e4
Avoid atomic updates across CPU/GPU in CUDA event ( #2231 )
2025-06-03 16:49:06 -07:00
Cheng
0bb89e9e5f
Share more common code in Compiled ( #2240 )
...
* Share more common code in Compiled
* Remove build_lib_name
2025-06-03 16:48:50 -07:00
Cheng
5685ceb3c7
Avoid invoking allocator::malloc when creating CUDA event ( #2232 )
2025-06-03 16:48:40 -07:00
Suryash Malviya
0408ba0a76
Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm ( #2220 )
...
* Implementing Complex Matmul using Karatsuba Algorithm
* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-06-02 15:58:46 -07:00
Awni Hannun
cbad6c3093
version ( #2237 )
2025-06-02 15:58:33 -07:00
Cheng
1b021f6984
Fast primitives decide when to use the fallback ( #2216 )
2025-06-02 13:26:37 -07:00