- Remove CPU-only restriction from linalg::svd function
- Allow SVD operations to run on GPU devices
- Add documentation noting Metal GPU acceleration support for float32
- Maintain backward compatibility with existing CPU usage
- Enable users to explicitly request GPU execution for SVD
- Implement SVD::eval_gpu in Metal primitives backend
- Add proper float32/float64 type dispatch
- Include clear error messages for unsupported double precision
- Connect SVD primitive to Metal backend implementation
- Enable GPU path for SVD operations in MLX
- Add comprehensive SVD implementation in mlx/backend/metal/svd.cpp
- Include input validation for dimensions, data types, and edge cases
- Implement CPU fallback for immediate functionality
- Add proper error handling for unsupported float64 operations
- Support both singular values only and full SVD decomposition
- Prepare infrastructure for future Metal kernel integration
- Add svd.h header with kernel declarations
- Add svd.metal with placeholder Metal compute shaders
- Define SVD algorithm parameters and data structures
- Prepare foundation for Metal GPU-accelerated SVD implementation
- Add complete Metal kernel implementations for SVD computation:
* svd_preprocess: Computes A^T * A matrix
* svd_jacobi_iteration: Performs Jacobi rotations to diagonalize
* svd_extract_singular_values: Extracts singular values from diagonal
* svd_compute_vectors: Computes singular vectors (basic implementation)
- Update host-side implementation to orchestrate kernel execution:
* Allocate workspace for A^T * A and rotation storage
* Execute preprocessing, iteration, and extraction phases
* Handle both singular values only and full SVD modes
- Add proper template instantiations for float and double precision
This provides a working Metal SVD implementation using the Jacobi method.
Performance optimizations and convergence checking will follow.
* Refactor gemv into a function
* Refactor splitk step 1
* Refactor split k axpby
* Rearrange steel_gemm_regular
* Redirect steel_gemm_regular
* Add axpby routing to steel_matmul_regular
* Refactor AddMM step 1
* Redirect steel_gemm
* Update addmm
* Comments and format
* Some cleanup
* Add architecture gen to device
* Update no copy condition in normalization to account for axis size 1
- Add SVDParams, JacobiRotation, and SVDConvergenceInfo structures
- Create placeholder Metal kernel declarations for SVD operations
- Add SVD kernel compilation to CMake build system
- Update SVD::eval_gpu to dispatch to Metal implementation
- Add basic input validation and error handling
- Include placeholder kernel implementation for compilation
This establishes the foundation for Metal SVD implementation.
Actual algorithm implementation will follow in subsequent commits.