zhangyiss/mlx - mlx - Gitea for Geophysics

mirror of https://github.com/ml-explore/mlx.git synced 2025-09-19 10:48:09 +08:00

Author	SHA1	Message	Date
Cheng	6a3acf2301	[CUDA] Set bias as input when using bias epilogue (#2584 )	2025-09-11 15:31:09 +09:00
Awni Hannun	d6977f2a57	Add sdpa with sinks (#2558 ) * add sdpa with sinks * fix 2 pass * fix matrix sdpa * fix perf regression * add to cuda (#2580)	2025-09-10 14:53:00 -07:00
Gökdeniz Gülmez	db5443e831	Adding Relu2 (#2582 ) * in. com. * upd. ackn. * update __init__ * nits * nits + format * used mx.maximum(x, 0) instead of calling the function and moves relu6 under relu2 to make it nicer * same with _make_activation_module * Update python/mlx/nn/layers/activations.py upd Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * update funct.rst * upd. layers.rst --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com>	2025-09-10 07:24:30 -07:00
Cheng	52b8384d10	Fix flaky addmm tests (#2581 )	2025-09-10 14:22:22 +09:00
Cheng	44cc5da4bc	[CUDA] Fix alpha not respected when using bias epilogue (#2578 )	2025-09-10 09:08:01 +09:00
Awni Hannun	17310d91a6	Add batch offsets for mx.fast.rope (#2564 ) * implement batch rope for Metal * cuda rope (#2576)	2025-09-08 17:35:07 -07:00
Cheng	a44b27f5f8	Fix a few ccache cache miss (#2573 ) * Fix ccache cache miss * Do not define _VERSION_ in python bindings	2025-09-09 07:41:05 +09:00
XXXXRT666	8f163a367d	typing: add type hints to mlx.core.array, linalg, distributed, and random (#2565 ) * Add type annotations to mlx methods * Missing list_or_scalar	2025-09-04 09:08:11 -07:00
Manuel Villanueva	89a3df9014	Fixed several type annotations in the MLX stubs which degraded to Unknown/Any (#2560 ) * Added scalar to stubs to fix Unkown Type Hint ### Proposed changes Issue #2478 reports that several type annotations in the MLX stubs degrade to Unknown/Any in editors like VS Code with Pylance, due to missing imports (Union, Optional, Tuple) and an undefined scalar type alias. This PR updates the stub generation patterns to: • Add missing typing imports in mlx.core.__prefix__ so that Union, Optional, Tuple, etc. are always available. • Define and export scalar: TypeAlias = Union[int, float, bool] in mlx.core.__suffix__ so that functions typed with Union[scalar, array] resolve correctly instead of falling back to Any. • Update submodule stub prefixes (distributed, fast, linalg, metal, random) to import scalar alongside array, Device, and Stream, ensuring type checkers resolve the union consistently across modules. With these changes, functions like mlx.add now display rich type signatures such as: ``` def add( a: scalar \| array, b: scalar \| array, stream: Stream \| Device \| None = None ) -> array ``` instead of degrading to Any. ### Checklist • I have read the CONTRIBUTING document • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes • I have added tests that prove my fix is effective or that my feature works (n/a — stub generation only) • I have updated the necessary documentation (if needed) * add bool to patterns --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-09-03 12:52:08 -07:00
Awni Hannun	b61a65e313	fix copies in sdpa (#2563 )	2025-09-02 11:00:36 -07:00
wrmsr	04cbb4191c	Fix dequantize python sig (#2562 )	2025-09-01 11:50:20 -07:00
Artur Antonov	c5460762e7	Fix AdamW weight_decay default value in docstring (#2557 )	2025-08-31 21:29:30 -07:00
Awni Hannun	8ce49cd39e	fix quantized vjp for mxfp4 (#2555 )	2025-08-29 10:06:15 -07:00
Awni Hannun	111f1e71af	Faster contiguous gather for indices in the first axis (#2552 ) * faster contiguous gather for indices in the first axis * work per thread > 1 * angelos suggestion for scales / biases	2025-08-28 21:26:30 -07:00
Awni Hannun	70560b6bd5	Add mode parameter for quantization (#2499 ) * add mode parameter for quantization * mxfp4 quantize/dequantize + start of optional biases * mxfp4 works * speedup * cpu mxfp4 * fix * fix test tol * fix * refactor * add quant mode enum	2025-08-28 06:45:26 -07:00
Awni Hannun	7ef8a6f2d5	[CUDA] fix sort (#2550 ) * [CUDA] fix sort * fix test	2025-08-27 19:48:43 -07:00
Awni Hannun	5458d43247	add load with path tests (#2543 )	2025-08-26 14:24:47 -07:00
Awni Hannun	3dcb286baf	Remove stream from average grads so it uses default (#2532 ) * Remove stream from average grads so it uses default * comment	2025-08-25 15:56:29 -07:00
Cheng	4822c3dbe9	[CUDA] Implement DynamicSlice/DynamicSliceUpdate (#2533 ) * Move DynamicSlice to gpu/primitives * Implement compute_dynamic_offset in CUDA	2025-08-26 07:31:39 +09:00
Awni Hannun	db14e29a0b	allow pathlib.Path to save/load functions (#2541 )	2025-08-25 14:58:49 -07:00
Awni Hannun	068a4612e9	nccl default for backend=any (#2528 ) * nccl default for backend=any * check num gpus + ensure row contiguous for all reduce * comment	2025-08-22 12:24:27 -07:00
Awni Hannun	f93f87c802	nccl dep + default for cuda (#2526 )	2025-08-21 17:57:49 -07:00
Anastasiia Filippova	9392fc3f88	NCCL backend (#2476 )	2025-08-21 11:56:15 -07:00
Awni Hannun	e843c4d8d5	fix power (#2523 )	2025-08-21 06:46:01 -07:00
Angelos Katharopoulos	e397177f6e	Custom cuda kernel (#2517 )	2025-08-20 17:20:22 -07:00
Cheng	f4c8888cbe	[CUDA] Fix stride of singleton dims before passing to cuDNN (#2521 )	2025-08-21 08:55:26 +09:00
Angelos Katharopoulos	25c1e03205	Fix overflow in large filter small channels (#2520 )	2025-08-20 08:03:29 -07:00
Cheng	ac85ddfdb7	[CUDA] Add GEMM-based fallback convolution kernels (#2511 ) * Add gemm_conv * Add gemm_grouped_conv	2025-08-20 10:06:22 +09:00
Awni Hannun	e7c6e1db82	no segfault with uninitialized array.at (#2514 )	2025-08-18 08:33:38 -07:00
Awni Hannun	c5fcd5b61b	fix custom kernel test (#2510 )	2025-08-18 06:45:59 -07:00
Cheng	1ba18ff7d9	[CUDA] Fix conv grads with groups (#2495 ) * Put reshape utils in one file * [CUDA] Fix conv grads with groups * Put the reshape utils in gpu/copy.h	2025-08-16 10:09:18 +09:00
Luca Vivona	728d4db582	Support destination arg in tree flatten/unflatten (#2450 )	2025-08-06 15:34:59 -07:00
Awni Hannun	fa89f0b150	faster gather qmm sorted test (#2463 )	2025-08-05 06:27:40 -07:00
Cheng	828c5f1137	Use SmallVector for shapes and strides (#2454 ) * Use SmallVector for shapes and strides * Convert SmallVector to tuple	2025-08-05 09:41:03 +09:00
Awni Hannun	0b807893a7	fix wraps compile (#2461 )	2025-08-04 16:14:18 -07:00
Cheng	86c6a15571	[CUDA] Backward convolution (#2431 )	2025-08-01 09:54:05 +09:00
junpeiz	8b25ce62d5	Add tests for export including control flow models and quantized models (#2430 ) * Add tests for export, including control flow export and quantized model export. * Skip quantization related test for CUDA backend.	2025-07-31 11:06:26 -07:00
Awni Hannun	d32519c8ee	fix gemv regression (#2445 )	2025-07-30 14:23:01 -07:00
Awni Hannun	b405591249	fix circular reference (#2443 )	2025-07-30 09:37:44 -07:00
Awni Hannun	ef631d63af	faster rms norm (#2433 )	2025-07-29 13:12:00 -07:00
Awni Hannun	4ad53414dd	fix cuda pypi package (#2423 ) * fix cuda pypi package * patch bump	2025-07-25 15:20:29 -07:00
Awni Hannun	dcb8319f3d	update install docs and requirements (#2419 )	2025-07-25 12:13:19 -07:00
Awni Hannun	5597fa089c	Fix qvm splitk (#2415 )	2025-07-25 11:50:24 -07:00
Skonor	7d9d6ef456	docs: fix adam and adamw eps placement (#2416 ) Co-authored-by: Mikhail Gorbunov <m_gorbunov@apple.com>	2025-07-24 16:40:45 -07:00
Cheng	6f5874a2f2	[CUDA] Initial implementation of Convolution with cuDNN (#2385 ) * Link with cuDNN * Initial implementation * Remove backend apis * Fix recording cudnn conv * More unused backend apis * Fix C++ conv tests * include cudnn as python dep * Install libcudnn9-dev-cuda-12 in CI * cudnn only accepts contiguous inputs * Switch to backend apis * Plan needs to be kept alive * Turn off tf32 * Add cache * Test the native cuda graph api * Set cudnn stream before execution * Make LRUCache more like a normal container * Do error check for cublas handle * Zero-initilizing array * Use tf32 for conv * Skip TestConv.test_torch_conv_2D test --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-07-25 08:12:10 +09:00
Awni Hannun	d1f4d291e8	Fix uv install and add dev release (#2411 ) * fix uv install and add dev release * fix docstring * pin cuda deps * cuda release on cpu-only machine	2025-07-23 16:54:19 -07:00
Awni Hannun	e1840853ce	full row mask in sdpa consistently gives nan (#2406 )	2025-07-23 16:37:03 -07:00
Fangjun Kuang	28d068bce6	Fix an error in the comment for mx.dequantize (#2409 )	2025-07-23 06:10:50 -07:00
Awni Hannun	63f663d9c6	fix cuda manylinux version to match others (#2388 )	2025-07-18 21:02:16 -07:00
Gökdeniz Gülmez	deee214a95	Adding support for the Muon Optimizer (#1914 ) * initial commit with workong optmimizer * update ACKNOWLEDGMENTS.md * nits and adding it to test * nits * G.astype(mx.bfloat16) to G.astype(G.dtype) * G.ndim >= 2 to assert G.ndim == 2 * remove coments * replace with mx.addmm * remove comments * format * nits * match muon * fix addmm --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-07-18 12:25:28 -07:00

1 2 3 4 5 ...

756 Commits