- Add complete Metal kernel implementations for SVD computation:
* svd_preprocess: Computes A^T * A matrix
* svd_jacobi_iteration: Performs Jacobi rotations to diagonalize
* svd_extract_singular_values: Extracts singular values from diagonal
* svd_compute_vectors: Computes singular vectors (basic implementation)
- Update host-side implementation to orchestrate kernel execution:
* Allocate workspace for A^T * A and rotation storage
* Execute preprocessing, iteration, and extraction phases
* Handle both singular values only and full SVD modes
- Add proper template instantiations for float and double precision
This provides a working Metal SVD implementation using the Jacobi method.
Performance optimizations and convergence checking will follow.