AN Long
85a8824a8c
Fix cumulative operations when axis=None ( #2653 )
2025-10-08 15:25:38 -07:00
Awni Hannun
f5d4397e5c
Fix fast synch when fence is waited before a command buffer is created ( #2657 )
2025-10-08 11:23:46 -07:00
Awni Hannun
343e33b6d5
fix all_gather vjp ( #2654 )
2025-10-07 06:05:23 -07:00
Angelos Katharopoulos
0073096dd1
Split name into directories for cuda jit ( #2656 )
2025-10-07 01:52:58 -07:00
Angelos Katharopoulos
e3d004fed9
Fix and refactor row-reduce ( #2650 )
2025-10-07 01:51:08 -07:00
Awni Hannun
a393435d28
Speed up compile for node with many parents ( #2649 )
2025-10-03 19:30:36 -07:00
Awni Hannun
a7a94b29d7
Fix compile when outputs change ( #2648 )
2025-10-03 08:40:57 -07:00
Daniel Yeh
22a5da76c8
Faster complex matmul ( #2571 )
2025-10-02 23:33:15 -07:00
Andrey Portnoy
287c63a093
Configure CMake to export compile_commands.json ( #2645 )
...
This helps enable LSP for code navigation using clangd.
2025-10-02 15:40:32 -07:00
Awni Hannun
1c9ae1eaa1
cuda fix flaky test ( #2646 )
2025-10-02 15:40:04 -07:00
Angelos Katharopoulos
c2c3e0b0a2
[CUDA] Add a small column specialization to reduce ( #2642 )
2025-10-02 14:41:05 -07:00
Awni Hannun
b0cc71ae71
Faster triu, tril, where with scalar ( #2644 )
2025-10-02 12:21:27 -07:00
Awni Hannun
e88f2d4a8e
fix cross entropy axis param ( #2641 )
...
* fix cross entropy axis param
* faster grad clipping
2025-10-01 16:49:55 -07:00
Angelos Katharopoulos
9cee557423
Fix status message ( #2638 )
2025-10-01 16:43:45 -07:00
Awni Hannun
bbf1423953
wait for tasks in cuda ( #2636 )
2025-09-30 16:08:46 -07:00
Angelos Katharopoulos
eb24267b56
Compile now can attach arbitrary data to an entry ( #2634 )
2025-09-30 13:33:27 -07:00
Awni Hannun
dc371ae7a5
fix for max block dim ( #2631 )
2025-09-29 08:59:25 -07:00
AN Long
e76a8dd5c5
Fix incorrect path and typos ( #2630 )
2025-09-28 06:03:04 -07:00
Cheng
b466dea982
[CUDA] Make CudaEvent work with multi-device ( #2614 )
...
* Set current device when creating cuda event
* Separate cuda events by device
* Avoid race condition in pool
2025-09-27 11:27:17 +09:00
Angelos Katharopoulos
7a6adda1e6
Bump the version ( #2627 )
v0.29.2
2025-09-26 15:15:28 -07:00
Angelos Katharopoulos
1a9f820af6
Compiled should not end in broadcast ( #2622 )
2025-09-26 13:36:09 -07:00
Awni Hannun
d4f4ff3c5e
Allow None input to compiled functions ( #2621 )
...
* Allow None input to compiled functions
* Allow None input to compiled functions
2025-09-25 08:42:23 -07:00
Jagrit Digani
7c7e48dbd1
New tuning for small K gemv ( #2620 )
...
* New tuning for small K gemv
2025-09-23 12:28:35 -07:00
Daniel Yeh
fbbf3b9b3e
Support pickling array for bfloat16 ( #2586 )
...
* add bfloat16 pickling
* Improvements
* improve
---------
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de >
2025-09-22 20:12:15 -07:00
Daniel Yeh
bf01ad9367
fix ( #2613 )
...
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de >
2025-09-22 20:12:04 -07:00
Cheng
ae438d05fa
[CUDA] Recycle CUDA events ( #2604 )
...
* Make CudaEvent a CudaHandle
* Add caching for CudaEvent
* Make sure cuda events are destroyed at last
* Fix headers
* SharedEvent => AtomicEvent
* RawCudaEvent => CudaEventHandle, CudaEventWrapper => CopyableCudaEvent
* Remove unneeded asserts
2025-09-23 10:42:03 +09:00
Awni Hannun
711a645807
avoid producing NaN in attention ( #2608 )
2025-09-22 13:10:43 -07:00
Josh Bleecher Snyder
aa9d44b3d4
implement Convolution::output_shape ( #2601 )
...
- pull conv_out_shape out for re-use
- add Conv::output_shape
- add e2e python tests confirming shapeless=True support and correctness
Updates #2599
2025-09-22 10:09:45 -07:00
Awni Hannun
ec2ab42888
Lower sorted QMM gather threshold ( #2609 )
2025-09-19 18:22:55 -07:00
Cheng
787c0d90cd
Detect cache thrashing in LRUCache ( #2600 )
...
* Detect cache thrashing in LRUCache
* Do not check cache thrashing in tests
2025-09-19 09:12:14 +09:00
Oleksandr Bilous
e8b604a6a3
fix: library loading for swift dynamic frameworks ( #2568 )
2025-09-18 13:54:59 -07:00
Awni Hannun
50cc09887f
expose depends ( #2606 )
2025-09-18 10:06:15 -07:00
Umberto Mignozzetti
3f730e77aa
Update export function example for array input ( #2598 )
...
After changing the shape to conform (same shapes for all objects), the example works.
2025-09-16 14:38:05 -07:00
Awni Hannun
caecbe876a
no copy batch rope ( #2595 )
2025-09-15 14:23:48 -07:00
Umberto Mignozzetti
8afb6d62f2
Fix typo in average_gradients function call ( #2594 )
2025-09-15 11:29:21 -07:00
Awni Hannun
6ccfa603cd
fix metal scan ( #2591 )
2025-09-15 11:01:57 -07:00
Umberto Mignozzetti
36cad99a11
Refactor code examples to use 'gelu' ( #2592 )
...
Updated code examples to use 'gelu' directly instead of 'nn.gelu'.
2025-09-15 09:47:02 -07:00
Awni Hannun
ee18e1cbf0
patch bump ( #2588 )
v0.29.1
2025-09-11 17:10:09 -07:00
Awni Hannun
af120c2bc0
set nccl ABI version ( #2587 )
2025-09-11 16:55:53 -07:00
Cheng
6a3acf2301
[CUDA] Set bias as input when using bias epilogue ( #2584 )
2025-09-11 15:31:09 +09:00
Awni Hannun
d6977f2a57
Add sdpa with sinks ( #2558 )
...
* add sdpa with sinks
* fix 2 pass
* fix matrix sdpa
* fix perf regression
* add to cuda (#2580 )
2025-09-10 14:53:00 -07:00
Gökdeniz Gülmez
db5443e831
Adding Relu2 ( #2582 )
...
* in. com.
* upd. ackn.
* update __init__
* nits
* nits + format
* used mx.maximum(x, 0) instead of calling the function and moves relu6 under relu2 to make it nicer
* same with _make_activation_module
* Update python/mlx/nn/layers/activations.py
upd
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* update funct.rst
* upd. layers.rst
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2025-09-10 07:24:30 -07:00
Cheng
52b8384d10
Fix flaky addmm tests ( #2581 )
2025-09-10 14:22:22 +09:00
Cheng
44cc5da4bc
[CUDA] Fix alpha not respected when using bias epilogue ( #2578 )
2025-09-10 09:08:01 +09:00
Cheng
dde3682b69
[CUDA] Use GEMM with epilogue instead of AddMM ( #2569 )
2025-09-09 13:18:49 +09:00
Awni Hannun
17310d91a6
Add batch offsets for mx.fast.rope ( #2564 )
...
* implement batch rope for Metal
* cuda rope (#2576 )
2025-09-08 17:35:07 -07:00
Cheng
b194d65a6a
Some tweaks in cmake files ( #2574 )
...
* Do proper check of Metal lib
* Update doctest to get rid of cmake version hack
2025-09-09 08:27:18 +09:00
Cheng
a44b27f5f8
Fix a few ccache cache miss ( #2573 )
...
* Fix ccache cache miss
* Do not define _VERSION_ in python bindings
2025-09-09 07:41:05 +09:00
Awni Hannun
e5a33f2223
faster depthwise 1D conv ( #2567 )
2025-09-08 11:37:23 -07:00
Cheng
c1e3340b23
Set ccache size before building ( #2570 )
2025-09-07 09:00:31 +09:00