zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-07-24 19:11:17 +08:00

Author	SHA1	Message	Date
Arkar Min Aung	f2c731c29b	feat: Enable GPU support in linalg SVD interface - Remove CPU-only restriction from linalg::svd function - Allow SVD operations to run on GPU devices - Add documentation noting Metal GPU acceleration support for float32 - Maintain backward compatibility with existing CPU usage - Enable users to explicitly request GPU execution for SVD	2025-06-14 21:23:18 +10:00
Arkar Min Aung	f4789ab8b9	feat: Add SVD primitive GPU evaluation support - Implement SVD::eval_gpu in Metal primitives backend - Add proper float32/float64 type dispatch - Include clear error messages for unsupported double precision - Connect SVD primitive to Metal backend implementation - Enable GPU path for SVD operations in MLX	2025-06-14 21:23:04 +10:00
Arkar Min Aung	54125e5ff5	feat: Implement Metal SVD backend with CPU fallback - Add comprehensive SVD implementation in mlx/backend/metal/svd.cpp - Include input validation for dimensions, data types, and edge cases - Implement CPU fallback for immediate functionality - Add proper error handling for unsupported float64 operations - Support both singular values only and full SVD decomposition - Prepare infrastructure for future Metal kernel integration	2025-06-14 21:22:49 +10:00
Arkar Min Aung	b7838461c1	feat: Add Metal SVD kernel infrastructure - Add svd.h header with kernel declarations - Add svd.metal with placeholder Metal compute shaders - Define SVD algorithm parameters and data structures - Prepare foundation for Metal GPU-accelerated SVD implementation	2025-06-14 21:22:34 +10:00
Arkar Min Aung	6d01528e90	feat: Add benchmarking and documentation updates for Metal SVD - Add comprehensive SVD benchmark script (benchmarks/python/svd_benchmark.py): * Performance comparison between CPU and GPU implementations * Batch processing benchmarks * Correctness verification tests * Detailed timing and speedup analysis - Update linalg documentation to mention Metal GPU acceleration - Add implementation summary document for development reference This addresses CONTRIBUTING.md requirements: - Benchmarks for efficiency impact measurement (point 3) - Documentation updates for API changes (point 4) - Comprehensive testing coverage (point 2)	2025-06-14 17:28:19 +10:00
Arkar Min Aung	5875252f87	feat: Add comprehensive testing and documentation for Metal SVD - Add comprehensive test suite (test_metal_svd.cpp): * Basic functionality tests * Input validation tests * Various matrix sizes and batch processing * Reconstruction accuracy verification * Orthogonality property checks * Special matrices (identity, zero, diagonal) * Performance characteristic tests - Add detailed implementation documentation: * Algorithm description and complexity analysis * Usage examples and API documentation * Performance benchmarks and characteristics * Implementation details and file structure * Error handling and limitations * Contributing guidelines - Enhance error handling and robustness: * Improved input validation with detailed error messages * Memory allocation error handling * NaN/Inf input detection * Performance logging for large matrices - Integrate tests into CMake build system This completes the Metal SVD implementation with production-ready testing and documentation.	2025-06-14 17:05:10 +10:00
Arkar Min Aung	c09f1faf9a	feat: Add convergence checking and algorithm improvements - Add svd_check_convergence kernel to monitor off-diagonal norm - Implement proper convergence checking in Jacobi iterations - Add algorithm selection heuristics based on matrix properties - Improve singular vector computation with proper rotation application - Add adaptive parameter selection (tolerance, max_iterations) - Enhance error handling and workspace management Key improvements: * Convergence checking every 5 iterations to reduce overhead * Matrix-size-dependent parameter tuning * Better memory management with convergence tracking * More accurate singular vector computation This significantly improves the robustness and efficiency of the Metal SVD implementation.	2025-06-14 17:05:10 +10:00
Arkar Min Aung	7ec92466df	feat: Implement basic one-sided Jacobi SVD algorithm in Metal - Add complete Metal kernel implementations for SVD computation: * svd_preprocess: Computes A^T * A matrix * svd_jacobi_iteration: Performs Jacobi rotations to diagonalize * svd_extract_singular_values: Extracts singular values from diagonal * svd_compute_vectors: Computes singular vectors (basic implementation) - Update host-side implementation to orchestrate kernel execution: * Allocate workspace for A^T * A and rotation storage * Execute preprocessing, iteration, and extraction phases * Handle both singular values only and full SVD modes - Add proper template instantiations for float and double precision This provides a working Metal SVD implementation using the Jacobi method. Performance optimizations and convergence checking will follow.	2025-06-14 17:05:10 +10:00
Arkar Min Aung	c67eea520e	Merge branch 'ml-explore:main' into feature/metal-svd-base	2025-06-14 16:53:43 +10:00
Awni Hannun	a6d780154f	fix cuda gemm for bf16 (#2288 )	2025-06-13 22:10:46 -07:00
Awni Hannun	6871e2eeb7	fix cuda jit (#2287 )	2025-06-13 19:21:46 -07:00
Awni Hannun	8402a2acf4	Fix complex power and print (#2286 ) * fix complex power and print * fix complex matmul shape	2025-06-13 11:13:00 -07:00
Jagrit Digani	fddb6933e1	Collection of refactors (#2274 ) * Refactor gemv into a function * Refactor splitk step 1 * Refactor split k axpby * Rearrange steel_gemm_regular * Redirect steel_gemm_regular * Add axpby routing to steel_matmul_regular * Refactor AddMM step 1 * Redirect steel_gemm * Update addmm * Comments and format * Some cleanup * Add architecture gen to device * Update no copy condition in normalization to account for axis size 1	2025-06-13 10:44:56 -07:00
Arkar Min Aung	a71a9e0ddd	feat: Add Metal SVD infrastructure and parameter structures - Add SVDParams, JacobiRotation, and SVDConvergenceInfo structures - Create placeholder Metal kernel declarations for SVD operations - Add SVD kernel compilation to CMake build system - Update SVD::eval_gpu to dispatch to Metal implementation - Add basic input validation and error handling - Include placeholder kernel implementation for compilation This establishes the foundation for Metal SVD implementation. Actual algorithm implementation will follow in subsequent commits.	2025-06-13 23:28:52 +10:00
Cheng	c8b4787e4e	CUDA backend: indexing ops (#2277 )	2025-06-12 21:44:19 -07:00
Awni Hannun	2188199ff8	[CUDA] ternary with select op (#2283 ) * cuda ternary with select op * comment + fix * fix	2025-06-12 20:24:43 -07:00
Awni Hannun	aa07429bad	Fix cuda build (#2284 )	2025-06-12 17:48:05 -07:00
Awni Hannun	918761a25a	[CUDA] RMSNorm and VJP (#2280 ) * rms norm start * nit	2025-06-12 17:09:49 -07:00
Cheng	a4fc671d3e	CUDA backend: compile (#2276 ) * CUDA backend: compile * Rename kernels/ to device/	2025-06-12 17:08:39 -07:00
Awni Hannun	f5f65ef48c	Make sliceUpdate general (#2282 ) * Make sliceUpdate general * fix	2025-06-12 16:48:54 -07:00
Cheng	c2dd81a8aa	Fix warnings from latest CUDA toolkit (#2275 )	2025-06-12 06:03:01 -07:00
Cheng	d7e680ffe4	CUDA backend: layernorm (#2271 )	2025-06-11 15:48:32 -07:00
Cheng	c371baf53a	CUDA backend: softmax (#2272 )	2025-06-11 13:55:22 -07:00
Cheng	ccf78f566c	CUDA backend: argreduce (#2270 )	2025-06-11 13:26:17 -07:00
Cheng	c9fa68664a	CUDA backend: reduce (#2269 )	2025-06-11 11:22:25 -07:00
Awni Hannun	c35f4d089a	start cuda circle config (#2256 ) * rebase * fix metal kernel linking issue on cuda * start cuda circle config	2025-06-10 21:19:47 -07:00
Angelos Katharopoulos	8590c0941e	Add load_safe to the general conv loaders (#2258 )	2025-06-10 20:58:16 -07:00
Cheng	095163b8d1	Fix building cpp benchmarks on Linux (#2268 )	2025-06-10 17:10:24 -07:00
Cheng	99c33d011d	rebase + nit (#2260 ) Co-authored-by: Awni Hannun <awni@apple.com>	2025-06-10 10:51:51 -07:00
Awni Hannun	62fecf3e13	fix conv export (#2265 )	2025-06-10 09:34:01 -07:00
Cheng	7c4eb5d03e	CUDA backend: random (#2261 )	2025-06-10 08:59:56 -07:00
Cheng	bae9a6b404	CUDA backend: sort (#2262 ) Co-authored-by: Awni Hannun <awni@apple.com>	2025-06-10 08:59:47 -07:00
Christopher Fleetwood	004c1d8ef2	Report number of missing parameters (#2264 ) * chore: inform * chore: format --------- Co-authored-by: FL33TW00D <FL33TW00D@users.noreply.github.com>	2025-06-10 06:37:50 -07:00
Cheng	7ebb2e0193	CUDA backend: binary ops (#2259 )	2025-06-10 06:37:40 -07:00
Awni Hannun	9ce77798b1	fix export to work with gather/scatter axis (#2263 )	2025-06-09 20:37:27 -07:00
Cheng	f8bad60609	CUDA backend: unary ops (#2158 )	2025-06-09 06:45:08 -07:00
Emmanuel Ferdman	5866b3857b	Refactor the lu test (#2250 ) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-06-07 06:12:08 -07:00
Awni Hannun	1ca616844b	Fix unintuitive metal kernel caching (#2242 ) * Fix unintuitive metal kernel caching * alternative solution	2025-06-06 20:08:15 -07:00
Angelos Katharopoulos	2e8cf0b450	Change layernorms to two pass algorithm (#2246 )	2025-06-06 13:34:56 -07:00
Cheng	24f89173d1	CUDA backend: matmul (#2241 )	2025-06-06 12:24:04 -07:00
Awni Hannun	c6a20b427a	Improve metal elementwise kernels (#2247 ) * improve metal elementwise kernels * compile and copy * fix jit	2025-06-06 11:37:40 -07:00
Awni Hannun	a5ac9244c4	fix linux linking error (#2248 )	2025-06-06 10:41:51 -07:00
Awni Hannun	c763fe1be0	default strict mode for module update and update_modules (#2239 )	2025-06-05 15:27:02 -07:00
Cheng	52dc8c8cd5	Add profiler annotations in common primitives for CUDA backend (#2244 )	2025-06-04 19:55:12 -07:00
Angelos Katharopoulos	aede70e81d	Perf regression fix (#2243 )	2025-06-03 17:55:12 -07:00
Cheng	85a8beb5e4	Avoid atomic updates across CPU/GPU in CUDA event (#2231 )	2025-06-03 16:49:06 -07:00
Cheng	0bb89e9e5f	Share more common code in Compiled (#2240 ) * Share more common code in Compiled * Remove build_lib_name	2025-06-03 16:48:50 -07:00
Cheng	5685ceb3c7	Avoid invoking allocator::malloc when creating CUDA event (#2232 )	2025-06-03 16:48:40 -07:00
Suryash Malviya	0408ba0a76	Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm (#2220 ) * Implementing Complex Matmul using Karatsuba Algorithm * Implemented Karatsuba's Algorithm for complex matmul and pre-commit them * fix --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-06-02 15:58:46 -07:00
Awni Hannun	cbad6c3093	version (#2237 )	2025-06-02 15:58:33 -07:00

1 2 3 4 5 ...

1215 Commits