zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-07-28 21:21:21 +08:00

Author	SHA1	Message	Date
Arkar Min Aung	6d01528e90	feat: Add benchmarking and documentation updates for Metal SVD - Add comprehensive SVD benchmark script (benchmarks/python/svd_benchmark.py): * Performance comparison between CPU and GPU implementations * Batch processing benchmarks * Correctness verification tests * Detailed timing and speedup analysis - Update linalg documentation to mention Metal GPU acceleration - Add implementation summary document for development reference This addresses CONTRIBUTING.md requirements: - Benchmarks for efficiency impact measurement (point 3) - Documentation updates for API changes (point 4) - Comprehensive testing coverage (point 2)	2025-06-14 17:28:19 +10:00
Arkar Min Aung	5875252f87	feat: Add comprehensive testing and documentation for Metal SVD - Add comprehensive test suite (test_metal_svd.cpp): * Basic functionality tests * Input validation tests * Various matrix sizes and batch processing * Reconstruction accuracy verification * Orthogonality property checks * Special matrices (identity, zero, diagonal) * Performance characteristic tests - Add detailed implementation documentation: * Algorithm description and complexity analysis * Usage examples and API documentation * Performance benchmarks and characteristics * Implementation details and file structure * Error handling and limitations * Contributing guidelines - Enhance error handling and robustness: * Improved input validation with detailed error messages * Memory allocation error handling * NaN/Inf input detection * Performance logging for large matrices - Integrate tests into CMake build system This completes the Metal SVD implementation with production-ready testing and documentation.	2025-06-14 17:05:10 +10:00
Arkar Min Aung	c09f1faf9a	feat: Add convergence checking and algorithm improvements - Add svd_check_convergence kernel to monitor off-diagonal norm - Implement proper convergence checking in Jacobi iterations - Add algorithm selection heuristics based on matrix properties - Improve singular vector computation with proper rotation application - Add adaptive parameter selection (tolerance, max_iterations) - Enhance error handling and workspace management Key improvements: * Convergence checking every 5 iterations to reduce overhead * Matrix-size-dependent parameter tuning * Better memory management with convergence tracking * More accurate singular vector computation This significantly improves the robustness and efficiency of the Metal SVD implementation.	2025-06-14 17:05:10 +10:00
Arkar Min Aung	7ec92466df	feat: Implement basic one-sided Jacobi SVD algorithm in Metal - Add complete Metal kernel implementations for SVD computation: * svd_preprocess: Computes A^T * A matrix * svd_jacobi_iteration: Performs Jacobi rotations to diagonalize * svd_extract_singular_values: Extracts singular values from diagonal * svd_compute_vectors: Computes singular vectors (basic implementation) - Update host-side implementation to orchestrate kernel execution: * Allocate workspace for A^T * A and rotation storage * Execute preprocessing, iteration, and extraction phases * Handle both singular values only and full SVD modes - Add proper template instantiations for float and double precision This provides a working Metal SVD implementation using the Jacobi method. Performance optimizations and convergence checking will follow.	2025-06-14 17:05:10 +10:00
Arkar Min Aung	c67eea520e	Merge branch 'ml-explore:main' into feature/metal-svd-base	2025-06-14 16:53:43 +10:00
Awni Hannun	8402a2acf4	Fix complex power and print (#2286 ) * fix complex power and print * fix complex matmul shape	2025-06-13 11:13:00 -07:00
Jagrit Digani	fddb6933e1	Collection of refactors (#2274 ) * Refactor gemv into a function * Refactor splitk step 1 * Refactor split k axpby * Rearrange steel_gemm_regular * Redirect steel_gemm_regular * Add axpby routing to steel_matmul_regular * Refactor AddMM step 1 * Redirect steel_gemm * Update addmm * Comments and format * Some cleanup * Add architecture gen to device * Update no copy condition in normalization to account for axis size 1	2025-06-13 10:44:56 -07:00
Arkar Min Aung	a71a9e0ddd	feat: Add Metal SVD infrastructure and parameter structures - Add SVDParams, JacobiRotation, and SVDConvergenceInfo structures - Create placeholder Metal kernel declarations for SVD operations - Add SVD kernel compilation to CMake build system - Update SVD::eval_gpu to dispatch to Metal implementation - Add basic input validation and error handling - Include placeholder kernel implementation for compilation This establishes the foundation for Metal SVD implementation. Actual algorithm implementation will follow in subsequent commits.	2025-06-13 23:28:52 +10:00
Awni Hannun	f5f65ef48c	Make sliceUpdate general (#2282 ) * Make sliceUpdate general * fix	2025-06-12 16:48:54 -07:00
Awni Hannun	c35f4d089a	start cuda circle config (#2256 ) * rebase * fix metal kernel linking issue on cuda * start cuda circle config	2025-06-10 21:19:47 -07:00
Angelos Katharopoulos	8590c0941e	Add load_safe to the general conv loaders (#2258 )	2025-06-10 20:58:16 -07:00
Cheng	f8bad60609	CUDA backend: unary ops (#2158 )	2025-06-09 06:45:08 -07:00
Awni Hannun	1ca616844b	Fix unintuitive metal kernel caching (#2242 ) * Fix unintuitive metal kernel caching * alternative solution	2025-06-06 20:08:15 -07:00
Angelos Katharopoulos	2e8cf0b450	Change layernorms to two pass algorithm (#2246 )	2025-06-06 13:34:56 -07:00
Cheng	24f89173d1	CUDA backend: matmul (#2241 )	2025-06-06 12:24:04 -07:00
Awni Hannun	c6a20b427a	Improve metal elementwise kernels (#2247 ) * improve metal elementwise kernels * compile and copy * fix jit	2025-06-06 11:37:40 -07:00
Cheng	0bb89e9e5f	Share more common code in Compiled (#2240 ) * Share more common code in Compiled * Remove build_lib_name	2025-06-03 16:48:50 -07:00
Cheng	1b021f6984	Fast primitives decide when to use the fallback (#2216 )	2025-06-02 13:26:37 -07:00
Cheng	db5a7c6192	Add memory cache to CUDA backend (#2221 ) * Move BufferCache out of allocator * Add memory cache to cuda backend allocator * Simplify BufferCache assuming buf can not be null	2025-05-30 12:12:54 -07:00
Awni Hannun	6ef2f67e7f	5bit quants (#2226 ) * 5bit quants * 5bit quants	2025-05-30 12:12:10 -07:00
Cheng	f76ee1ffd2	Move some dims utils to common (#2223 )	2025-05-29 06:48:30 -07:00
Cheng	79071bfba4	Fix out-of-bounds default value in logsumexp/softmax (#2213 )	2025-05-21 07:25:16 -07:00
Cheng	7774b87cbd	Remove redundant simd_sum in logsumexp (#2210 )	2025-05-21 07:25:03 -07:00
Awni Hannun	eebe73001a	fix large arg reduce (#2206 )	2025-05-19 13:10:44 -07:00
Awni Hannun	8576e6fe36	fix conv2d bug + faster conv 1d (#2195 ) * fix conv2d bug + faster conv 1d * revert sort + flaky test	2025-05-18 06:05:11 -07:00
Jack Wind	7ff5c41e06	Add set_threadgroup_memory_length to CommandEncoder (#2183 )	2025-05-16 00:28:03 -07:00
Awni Hannun	c1eb9d05d9	non-symmetric eig and eigh (#2188 )	2025-05-15 13:01:44 -07:00
Cheng	0751263dec	Fix typo in row_reduce_small (#2179 )	2025-05-13 20:19:54 -07:00
Cheng	eca2f3eb97	Add remove_index utility (#2173 )	2025-05-13 17:09:56 -07:00
Awni Hannun	8f3d208dce	Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177 ) * handle hadamard and addmm on empty inputs * fix	2025-05-12 10:48:57 -07:00
Awni Hannun	6661387066	Fix fft for integer overflow (#2161 )	2025-05-09 14:25:12 -07:00
ATurker	a7fae8a176	fix: conv_general differences between gpu, cpu (#2070 ) * fix general_conv padding * fix bugs * add test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-05-09 10:26:52 -07:00
Awni Hannun	5a1a5d5ed1	fix input coherent kernel launch (#2153 )	2025-05-05 17:30:50 -07:00
Cheng	1683975acf	Move common gpu primitives to backend/gpu (#2145 )	2025-05-05 13:45:29 -07:00
Awni Hannun	af705590ac	fix batched vector sdpa (#2152 )	2025-05-05 13:13:03 -07:00
Awni Hannun	825124af8f	fix bw for elementwise ops (#2151 ) * fix bw for elementwise ops * add compile * fix * fix * fix * fix	2025-05-05 06:15:04 -07:00
Angelos Katharopoulos	481349495b	GPU Hadamard for large N (#1879 )	2025-05-01 17:19:17 -07:00
Awni Hannun	e496c5a4b4	fix integer overflow in qmm (#2143 )	2025-04-30 09:28:56 -07:00
Awni Hannun	f1606486d2	Generalize gpu backend (#2138 ) * generalize gpu backend * fix no_gpu build * fix no_gpu build * generalize gpu backend	2025-04-30 09:08:17 -07:00
Alex Chi Z.	b36dd472bb	return library if it is successfully loaded (#2131 )	2025-04-29 07:30:36 -07:00
hdeng-apple	167b759a38	Fix typos (#2136 )	2025-04-29 07:26:05 -07:00
Angelos Katharopoulos	f0e70afff0	Fix swift pm load (#2117 )	2025-04-24 10:58:29 -07:00
hdeng-apple	38c1e720c2	Search mlx.metallib in macOS framework "Resources" dir (#2061 ) --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-04-23 09:53:13 -07:00
Yury Popov	1d2c9d6a07	Complex scan (#2094 )	2025-04-22 18:56:28 -07:00
Awni Hannun	fdadc4f22c	Add more complex unary ops (#2101 )	2025-04-21 13:04:54 -07:00
Angelos Katharopoulos	3cde719eb7	Route to gather qmm only for many tokens per expert (#2082 )	2025-04-17 14:53:08 -07:00
Angelos Katharopoulos	5de6d94a90	Gather qmm batched kernel and refactoring of quantized (#2078 )	2025-04-17 13:53:11 -07:00
Angelos Katharopoulos	99eefd2ec0	Gather mm new kernel and small refactoring (#2040 )	2025-04-14 16:37:36 -07:00
Yury Popov	e9e268336b	LogCumSumExp (#2069 )	2025-04-13 01:27:29 -07:00
Angelos Katharopoulos	c4189a38e4	Add float mask to sdpa vector (#2068 )	2025-04-11 17:29:40 -07:00

1 2 3 4 5 ...

406 Commits