Commit Graph

1214 Commits

Author SHA1 Message Date
Arkar Min Aung
f4789ab8b9 feat: Add SVD primitive GPU evaluation support
- Implement SVD::eval_gpu in Metal primitives backend
- Add proper float32/float64 type dispatch
- Include clear error messages for unsupported double precision
- Connect SVD primitive to Metal backend implementation
- Enable GPU path for SVD operations in MLX
2025-06-14 21:23:04 +10:00
Arkar Min Aung
54125e5ff5 feat: Implement Metal SVD backend with CPU fallback
- Add comprehensive SVD implementation in mlx/backend/metal/svd.cpp
- Include input validation for dimensions, data types, and edge cases
- Implement CPU fallback for immediate functionality
- Add proper error handling for unsupported float64 operations
- Support both singular values only and full SVD decomposition
- Prepare infrastructure for future Metal kernel integration
2025-06-14 21:22:49 +10:00
Arkar Min Aung
b7838461c1 feat: Add Metal SVD kernel infrastructure
- Add svd.h header with kernel declarations
- Add svd.metal with placeholder Metal compute shaders
- Define SVD algorithm parameters and data structures
- Prepare foundation for Metal GPU-accelerated SVD implementation
2025-06-14 21:22:34 +10:00
Arkar Min Aung
6d01528e90 feat: Add benchmarking and documentation updates for Metal SVD
- Add comprehensive SVD benchmark script (benchmarks/python/svd_benchmark.py):
  * Performance comparison between CPU and GPU implementations
  * Batch processing benchmarks
  * Correctness verification tests
  * Detailed timing and speedup analysis

- Update linalg documentation to mention Metal GPU acceleration

- Add implementation summary document for development reference

This addresses CONTRIBUTING.md requirements:
- Benchmarks for efficiency impact measurement (point 3)
- Documentation updates for API changes (point 4)
- Comprehensive testing coverage (point 2)
2025-06-14 17:28:19 +10:00
Arkar Min Aung
5875252f87 feat: Add comprehensive testing and documentation for Metal SVD
- Add comprehensive test suite (test_metal_svd.cpp):
  * Basic functionality tests
  * Input validation tests
  * Various matrix sizes and batch processing
  * Reconstruction accuracy verification
  * Orthogonality property checks
  * Special matrices (identity, zero, diagonal)
  * Performance characteristic tests

- Add detailed implementation documentation:
  * Algorithm description and complexity analysis
  * Usage examples and API documentation
  * Performance benchmarks and characteristics
  * Implementation details and file structure
  * Error handling and limitations
  * Contributing guidelines

- Enhance error handling and robustness:
  * Improved input validation with detailed error messages
  * Memory allocation error handling
  * NaN/Inf input detection
  * Performance logging for large matrices

- Integrate tests into CMake build system

This completes the Metal SVD implementation with production-ready
testing and documentation.
2025-06-14 17:05:10 +10:00
Arkar Min Aung
c09f1faf9a feat: Add convergence checking and algorithm improvements
- Add svd_check_convergence kernel to monitor off-diagonal norm
- Implement proper convergence checking in Jacobi iterations
- Add algorithm selection heuristics based on matrix properties
- Improve singular vector computation with proper rotation application
- Add adaptive parameter selection (tolerance, max_iterations)
- Enhance error handling and workspace management

Key improvements:
* Convergence checking every 5 iterations to reduce overhead
* Matrix-size-dependent parameter tuning
* Better memory management with convergence tracking
* More accurate singular vector computation

This significantly improves the robustness and efficiency of the
Metal SVD implementation.
2025-06-14 17:05:10 +10:00
Arkar Min Aung
7ec92466df feat: Implement basic one-sided Jacobi SVD algorithm in Metal
- Add complete Metal kernel implementations for SVD computation:
  * svd_preprocess: Computes A^T * A matrix
  * svd_jacobi_iteration: Performs Jacobi rotations to diagonalize
  * svd_extract_singular_values: Extracts singular values from diagonal
  * svd_compute_vectors: Computes singular vectors (basic implementation)

- Update host-side implementation to orchestrate kernel execution:
  * Allocate workspace for A^T * A and rotation storage
  * Execute preprocessing, iteration, and extraction phases
  * Handle both singular values only and full SVD modes

- Add proper template instantiations for float and double precision

This provides a working Metal SVD implementation using the Jacobi method.
Performance optimizations and convergence checking will follow.
2025-06-14 17:05:10 +10:00
Arkar Min Aung
c67eea520e
Merge branch 'ml-explore:main' into feature/metal-svd-base 2025-06-14 16:53:43 +10:00
Awni Hannun
a6d780154f
fix cuda gemm for bf16 (#2288) 2025-06-13 22:10:46 -07:00
Awni Hannun
6871e2eeb7
fix cuda jit (#2287) 2025-06-13 19:21:46 -07:00
Awni Hannun
8402a2acf4
Fix complex power and print (#2286)
* fix complex power and print

* fix complex matmul shape
2025-06-13 11:13:00 -07:00
Jagrit Digani
fddb6933e1
Collection of refactors (#2274)
* Refactor gemv into a function

* Refactor splitk step 1

* Refactor split k axpby

* Rearrange steel_gemm_regular

* Redirect steel_gemm_regular

* Add axpby routing to steel_matmul_regular

* Refactor AddMM step 1

* Redirect steel_gemm

* Update addmm

* Comments and format

* Some cleanup

* Add architecture gen to device

* Update no copy condition in normalization to account for axis size 1
2025-06-13 10:44:56 -07:00
Arkar Min Aung
a71a9e0ddd feat: Add Metal SVD infrastructure and parameter structures
- Add SVDParams, JacobiRotation, and SVDConvergenceInfo structures
- Create placeholder Metal kernel declarations for SVD operations
- Add SVD kernel compilation to CMake build system
- Update SVD::eval_gpu to dispatch to Metal implementation
- Add basic input validation and error handling
- Include placeholder kernel implementation for compilation

This establishes the foundation for Metal SVD implementation.
Actual algorithm implementation will follow in subsequent commits.
2025-06-13 23:28:52 +10:00
Cheng
c8b4787e4e
CUDA backend: indexing ops (#2277) 2025-06-12 21:44:19 -07:00
Awni Hannun
2188199ff8
[CUDA] ternary with select op (#2283)
* cuda ternary with select op

* comment + fix

* fix
2025-06-12 20:24:43 -07:00
Awni Hannun
aa07429bad
Fix cuda build (#2284) 2025-06-12 17:48:05 -07:00
Awni Hannun
918761a25a
[CUDA] RMSNorm and VJP (#2280)
* rms norm start

* nit
2025-06-12 17:09:49 -07:00
Cheng
a4fc671d3e
CUDA backend: compile (#2276)
* CUDA backend: compile

* Rename kernels/ to device/
2025-06-12 17:08:39 -07:00
Awni Hannun
f5f65ef48c
Make sliceUpdate general (#2282)
* Make sliceUpdate general

* fix
2025-06-12 16:48:54 -07:00
Cheng
c2dd81a8aa
Fix warnings from latest CUDA toolkit (#2275) 2025-06-12 06:03:01 -07:00
Cheng
d7e680ffe4
CUDA backend: layernorm (#2271) 2025-06-11 15:48:32 -07:00
Cheng
c371baf53a
CUDA backend: softmax (#2272) 2025-06-11 13:55:22 -07:00
Cheng
ccf78f566c
CUDA backend: argreduce (#2270) 2025-06-11 13:26:17 -07:00
Cheng
c9fa68664a
CUDA backend: reduce (#2269) 2025-06-11 11:22:25 -07:00
Awni Hannun
c35f4d089a
start cuda circle config (#2256)
* rebase

* fix metal kernel linking issue on cuda

* start cuda circle config
2025-06-10 21:19:47 -07:00
Angelos Katharopoulos
8590c0941e
Add load_safe to the general conv loaders (#2258) 2025-06-10 20:58:16 -07:00
Cheng
095163b8d1
Fix building cpp benchmarks on Linux (#2268) 2025-06-10 17:10:24 -07:00
Cheng
99c33d011d
rebase + nit (#2260)
Co-authored-by: Awni Hannun <awni@apple.com>
2025-06-10 10:51:51 -07:00
Awni Hannun
62fecf3e13
fix conv export (#2265) 2025-06-10 09:34:01 -07:00
Cheng
7c4eb5d03e
CUDA backend: random (#2261) 2025-06-10 08:59:56 -07:00
Cheng
bae9a6b404
CUDA backend: sort (#2262)
Co-authored-by: Awni Hannun <awni@apple.com>
2025-06-10 08:59:47 -07:00
Christopher Fleetwood
004c1d8ef2
Report number of missing parameters (#2264)
* chore: inform

* chore: format

---------

Co-authored-by: FL33TW00D <FL33TW00D@users.noreply.github.com>
2025-06-10 06:37:50 -07:00
Cheng
7ebb2e0193
CUDA backend: binary ops (#2259) 2025-06-10 06:37:40 -07:00
Awni Hannun
9ce77798b1
fix export to work with gather/scatter axis (#2263) 2025-06-09 20:37:27 -07:00
Cheng
f8bad60609
CUDA backend: unary ops (#2158) 2025-06-09 06:45:08 -07:00
Emmanuel Ferdman
5866b3857b
Refactor the lu test (#2250)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-06-07 06:12:08 -07:00
Awni Hannun
1ca616844b
Fix unintuitive metal kernel caching (#2242)
* Fix unintuitive metal kernel caching

* alternative solution
2025-06-06 20:08:15 -07:00
Angelos Katharopoulos
2e8cf0b450
Change layernorms to two pass algorithm (#2246) 2025-06-06 13:34:56 -07:00
Cheng
24f89173d1
CUDA backend: matmul (#2241) 2025-06-06 12:24:04 -07:00
Awni Hannun
c6a20b427a
Improve metal elementwise kernels (#2247)
* improve metal elementwise kernels

* compile and copy

* fix jit
2025-06-06 11:37:40 -07:00
Awni Hannun
a5ac9244c4
fix linux linking error (#2248) 2025-06-06 10:41:51 -07:00
Awni Hannun
c763fe1be0
default strict mode for module update and update_modules (#2239) 2025-06-05 15:27:02 -07:00
Cheng
52dc8c8cd5
Add profiler annotations in common primitives for CUDA backend (#2244) 2025-06-04 19:55:12 -07:00
Angelos Katharopoulos
aede70e81d
Perf regression fix (#2243) 2025-06-03 17:55:12 -07:00
Cheng
85a8beb5e4
Avoid atomic updates across CPU/GPU in CUDA event (#2231) 2025-06-03 16:49:06 -07:00
Cheng
0bb89e9e5f
Share more common code in Compiled (#2240)
* Share more common code in Compiled

* Remove build_lib_name
2025-06-03 16:48:50 -07:00
Cheng
5685ceb3c7
Avoid invoking allocator::malloc when creating CUDA event (#2232) 2025-06-03 16:48:40 -07:00
Suryash Malviya
0408ba0a76
Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm (#2220)
* Implementing Complex Matmul using Karatsuba Algorithm

* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them

* fix

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-06-02 15:58:46 -07:00
Awni Hannun
cbad6c3093
version (#2237) 2025-06-02 15:58:33 -07:00
Cheng
1b021f6984
Fast primitives decide when to use the fallback (#2216) 2025-06-02 13:26:37 -07:00