zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-07-24 10:51:21 +08:00

Author	SHA1	Message	Date
Arkar Min Aung	cb4dc59a9e	feat(benchmarks): add comprehensive SVD performance benchmarks Add benchmarks for Metal SVD implementation as required by CONTRIBUTING.md: - Square matrix benchmarks (64x64 to 512x512) - Rectangular matrix benchmarks - Batched matrix benchmarks - CPU vs GPU performance comparison - Special matrices (identity, diagonal, zero) Benchmarks validate performance improvements from GPU acceleration and help identify performance regressions in future changes. Usage: python benchmarks/python/svd_bench.py --gpu python benchmarks/python/svd_bench.py --compare python benchmarks/python/svd_bench.py --all	2025-06-15 18:09:11 +10:00
Arkar Min Aung	e5c8773371	feat(metal): implement complete Metal SVD with Jacobi algorithm Add GPU-accelerated SVD implementation for Apple Silicon using Metal compute kernels. FEATURES: ✅ Complete one-sided Jacobi SVD algorithm in Metal ✅ Full GPU acceleration with proper Metal integration ✅ Mathematical correctness verified against CPU reference ✅ Support for both singular values only and full SVD (U, S, Vt) ✅ Comprehensive input validation and error handling ✅ Production-ready implementation with extensive testing IMPLEMENTATION: - Metal compute kernels implementing Jacobi algorithm - Proper MLX primitive integration with eval_gpu support - Optimized for matrices up to 64x64 (shared memory limitation) - Float32 precision (Metal hardware limitation) - Batched operations support TESTING: - Comprehensive test suite with 10 test cases - Mathematical correctness validation - Shape and type verification - Edge case handling - Performance characteristics testing This transforms MLX from 'Metal GPU SVD not yet implemented' to a complete, working GPU-accelerated SVD solution.	2025-06-15 17:44:38 +10:00
Cheng	79071bfba4	Fix out-of-bounds default value in logsumexp/softmax (#2213 )	2025-05-21 07:25:16 -07:00
Angelos Katharopoulos	cf6c939e86	Fix some complex vjps (#2178 )	2025-05-14 23:37:12 -07:00
Cheng	0cae0bdac8	CUDA backend: backbone (#2075 )	2025-05-06 21:26:46 -07:00
Awni Hannun	9c5e7da507	fix compile merging (#2150 )	2025-05-02 15:08:50 -07:00
Cheng	ea890d8710	Remove metal-only tests (#2139 )	2025-04-30 09:08:39 -07:00
Aashiq Dheeraj	bb6565ef14	add fftshift and ifftshift fft helpers (#2135 ) * add fftshift and ifftshift fft helpers * address comments * axes have to be iterable * fix fp error in roll + add test --------- Co-authored-by: Aashiq Dheeraj <aashiq@aashiq-mbp-m4.local>	2025-04-29 22:13:45 -07:00
Param Thakkar	600e87e03c	Added output_padding parameters in conv_transpose (#2092 )	2025-04-23 09:26:33 -07:00
Awni Hannun	dc4eada7f0	Use unordered map for kwargs in export/import (#2087 ) * use unordered map for kwargs in export/import * comment	2025-04-21 07:17:22 -07:00
Param Thakkar	5f04c0f818	Fixed shift operations issue (#2080 ) * Fixed shift operations issue * Added tests and fixes * Fixed loop syntax error * Added tests for bool * Fixed typo	2025-04-18 14:28:33 -07:00
Cheng	ba09f01ce8	Remove test of converting negative float to uint (#2048 )	2025-04-06 06:21:46 -07:00
Jesper Stemann Andersen	5f5770e3a2	Fix CPU sign for unsigned ints (#2024 ) Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2025-03-30 17:56:59 -07:00
Awni Hannun	5580b47291	iinfo and scalar overflow detection (#2009 )	2025-03-27 19:54:56 -07:00
Awni Hannun	a6b5d6e759	revise cmake minimum for doctest (#2014 )	2025-03-27 19:30:58 -07:00
Awni Hannun	4e1994e9d7	move memory APIs into top level mlx.core (#1982 )	2025-03-21 07:25:12 -07:00
Awni Hannun	c4230747a1	redesign for faster cpu/gpu synch (#1869 ) * redesign for faster cpu/gpu synch * load + more async CPU * use command encoder API and move more ops to use it * make fence back-end generic + CPU only fence * faster build * fix async eval * fixes + handle temporaries * fix / improve cpu conv * remove unused status, fix siblings * fix extensions * fix * fix no cpu build * format * comments * fix perf regression, remove unecessary abort * fix events, task limit cpu * fix waiting * fix donation / temporaries in normalization	2025-03-06 19:23:38 -08:00
Abe Leininger	3835a428c5	Adds nuclear norm support (#1894 ) * adjust norm unit test tolerance	2025-03-04 13:26:02 -08:00
Abe Leininger	a5ededf1c3	CPU LU factorization and linear solvers (#1451 ) * linalg solve backend * nits * more nits + fix * luf primitive and lu, solve, and solve_triangular backends * changes / nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-02-10 12:32:24 -08:00
Jesper Stemann Andersen	f6c0499b8d	Resolved ambiguity in mlx::core::take_along_axis (#1822 ) * Resolved ambiguity in mlx::core::take_along_axis Detected by GCC 10 on riscv64-linux-gnu. * Formatted * Removed superfluous parentheses in random_tests.cpp	2025-02-04 06:06:17 -08:00
Jesper Stemann Andersen	2d8e667400	MinGW support (#1806 ) * Changed /bin/bash to bash for generating compiling preamble * Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS * Solved ambiguity wrt. bernoulli test shape * Disabled distributed/ring on Windows * Fixed jit_compiler command wrt. MinGW * Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD	2025-02-01 12:40:06 -08:00
Awni Hannun	2235dee906	catch stream errors earlier to avoid aborts (#1801 )	2025-01-27 14:05:43 -08:00
Awni Hannun	da8c885784	Simplify removes no-ops from the tape (#1759 ) * simplify removes no-ops from the tape * comment	2025-01-09 11:23:19 -08:00
Awni Hannun	516ded618b	Dynamic slicing (#1741 ) * dynamic slice and slice update * python bindings + tests + fix set item * fix compile issue * comment * fix jit	2025-01-07 14:02:16 -08:00
Awni Hannun	ae69cb15e9	shapeless compile in docs and partially shapeless reshape (#1742 )	2025-01-02 16:24:42 -08:00
Cheng	8ecdfb718b	Fix export.cpp compilation with MSVC (#1737 )	2024-12-29 06:56:30 -08:00
Awni Hannun	4ba0c24a8f	Export / import functions to / from a file (#1642 ) * export and import functions * refactor + works for few primitives * nit * allow primitives with state * nit * nit * simplify serialize / deserialize * fix for constants * python bindings * maybe fix serialize failure case * add example * more primitives, training kind of works * same result for python and c++ * some fixes * fix export * template it up * some simplificatoin * rebase * allow kwargs and multiple functions * exporter * more primitives for exporting * deal with endianness * handle invalid stream * add docstring	2024-12-24 11:19:13 -08:00
Awni Hannun	c3628eea49	Add `mx.finfo` and use it when making causal mask (#1726 ) * finfo * fixes * docs	2024-12-19 14:52:41 -08:00
Awni Hannun	e03f0372b1	More shape type (#1705 ) * more shape type * fix	2024-12-19 08:08:20 -08:00
Awni Hannun	4e1e9520e1	Flatten and unflatten (#1692 ) * flatten and unflatten * fix grad * fix shape infer * use squeeze + unsqueeze in get_item	2024-12-11 21:51:37 -08:00
Awni Hannun	f3dfa36a3a	Fix x86 tests (#1691 ) * fix x86 tests * comment	2024-12-11 07:47:18 -08:00
Awni Hannun	f76a49e555	`ExpandDims` primitive (#1687 ) * add squeeze primitive * simplify squeeze, use in gather * fix * fix * fix * fix * fix no cpu * use squeeze in matmul and friends * expand dims primitive * comment	2024-12-10 16:39:07 -08:00
Awni Hannun	40c62c1321	Use int64 stride everywhere (#1671 ) * use int64 stride everywhere * fix ext * fix ext * more shape + cleanup * one more * few more	2024-12-09 11:09:02 -08:00
Cheng	d0f471cff7	Using math defines requires switch in MSVC (#1665 ) * Using math defines requires switch in MSVC * Fix more math macros * Fix type * Remove _MSC_VER guard for math defines	2024-12-08 08:16:28 -08:00
Cheng	6f316b8bf5	Use int64_t instead of ssize_t (#1673 )	2024-12-07 20:10:44 -08:00
Cheng	7c10c93a1f	Convert filesystem path to std::string explicitly (#1672 )	2024-12-07 20:10:06 -08:00
Awni Hannun	69a2991614	allow compiling lambdas in C++ (#1650 ) * allow compiling lambdas in C++ * fix test * more tests * auto detect capture-less lambda	2024-12-06 13:13:21 -08:00
Nripesh Niketan	3bb5b4a302	Chore: Add default language in pre-commit and bump hooks (#1652 )	2024-12-06 07:54:29 -08:00
Awni Hannun	e047fd977d	compile changes if stream changes (#1644 )	2024-12-03 14:37:44 -08:00
Awni Hannun	dcca0d7477	contiguous op / prim (#1612 )	2024-11-21 19:51:49 -08:00
Cocoa	0d5e7716ad	fix typo: accross -> across (#1609 ) Signed-off-by: Cocoa <i@uwucocoa.moe>	2024-11-20 15:30:51 -08:00
Alex Barron	048fabdabd	Fix vmap constant output size (#1524 ) * use inputs to determine output size * remove noop vmap tests	2024-10-30 16:16:53 -07:00
Kashif Rasul	3ddc07e936	Eigenvalues and eigenvectors (#1334 ) * initial eigvalsh * add compute_vectors * add compute_vectors_ * return a pair * add eigh to return only eigenvectors * fixed typo * merge merge Eighvalsh and Eigh into a single primitive * use the same primate with the flag * fix primatives * use MULTI * fix eval_gpu * fix decleration * rename EighPrimitive to Eigh * tests * tests * fix rebase and format * cleanup lapack * format * add cblas.h --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-10-22 12:18:48 -07:00
Angelos Katharopoulos	9b12093739	Add the roll op (#1455 )	2024-10-07 17:21:42 -07:00
Awni Hannun	95d04805b3	Fix complex power on Metal (#1460 )	2024-10-06 19:58:30 -07:00
Awni Hannun	195b429d99	Put along axis + fixe for partition grad (#1430 ) * put along axis, fixes for partition grad * zeros for arg reduce	2024-09-23 10:03:38 -07:00
Nripesh Niketan	6af5ca35b2	feat: add cross_product (#1252 ) * feat: add cross_product * lint * python binding * refactor: Improve error message for cross_product function * refactor: more close to numpy cross product * refactor: improve error message for cross_product function * finish * fix acks * allow old numpy * doc --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-17 13:12:43 -07:00
Nripesh Niketan	669c27140d	Chore: add pre-commit hook for cmake (#1362 ) * reset and lint * format --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-16 12:53:01 -07:00
Awni Hannun	e7e59c6f05	Fix copying scalars by adding fill_gpu (#1402 ) * fix copying scalars by adding fill_gpu * Another copy scalar changed to fill --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>	2024-09-09 15:54:08 -07:00
Awni Hannun	7cca1727af	Fix slice data size (#1394 ) * fix slice data size and add tests * fix contiguous flag * simplify stride and perform copy for non-contiguous arrays * fix cpu * comment	2024-09-04 19:10:43 -07:00

1 2 3 4

155 Commits