张壹 zhangyiss
  • Joined on 2024-09-10
zhangyiss synced and deleted reference refs/tags/distributed_vjps at zhangyiss/mlx from mirror 2025-10-08 00:28:10 +08:00
zhangyiss synced commits to fix-cuda-jit-large-name at zhangyiss/mlx from mirror 2025-10-07 16:18:09 +08:00
zhangyiss synced new reference fix-cuda-jit-large-name to zhangyiss/mlx from mirror 2025-10-07 16:18:09 +08:00
zhangyiss synced commits to fix-row-reduce at zhangyiss/mlx from mirror 2025-10-07 16:18:09 +08:00
3f86389409 Fix warp underflow
cad47a32e2 Re-tune
ed61cb2802 Use the same tuning for looped
08d528d705 Remove unused includes
d52aa2464e Fix and refactor row-reduce
Compare 6 commits »
zhangyiss synced commits to distributed_vjps at zhangyiss/mlx from mirror 2025-10-06 23:48:08 +08:00
zhangyiss synced new reference distributed_vjps to zhangyiss/mlx from mirror 2025-10-06 23:48:08 +08:00
zhangyiss synced commits to fix-row-reduce at zhangyiss/mlx from mirror 2025-10-04 13:28:08 +08:00
zhangyiss synced new reference fix-row-reduce to zhangyiss/mlx from mirror 2025-10-04 13:28:08 +08:00
zhangyiss synced commits to main at zhangyiss/mlx from mirror 2025-10-04 13:28:08 +08:00
a393435d28 Speed up compile for node with many parents (#2649)
zhangyiss synced commits to main at zhangyiss/mlx from mirror 2025-10-04 05:18:10 +08:00
a7a94b29d7 Fix compile when outputs change (#2648)
zhangyiss synced commits to main at zhangyiss/mlx from mirror 2025-10-03 21:15:16 +08:00
22a5da76c8 Faster complex matmul (#2571)
zhangyiss synced and deleted reference refs/tags/col-reduce-small at zhangyiss/mlx from mirror 2025-10-03 12:48:09 +08:00
zhangyiss synced commits to main at zhangyiss/mlx from mirror 2025-10-03 12:48:09 +08:00
287c63a093 Configure CMake to export compile_commands.json (#2645)
1c9ae1eaa1 cuda fix flaky test (#2646)
c2c3e0b0a2 [CUDA] Add a small column specialization to reduce (#2642)
Compare 3 commits »
zhangyiss synced commits to col-reduce-small at zhangyiss/mlx from mirror 2025-10-03 04:38:09 +08:00
ca7970a4f1 Make args references but ensure copy to kernel
214b1c1a06 Remove moves
Compare 2 commits »
zhangyiss synced commits to main at zhangyiss/mlx from mirror 2025-10-03 04:38:09 +08:00
b0cc71ae71 Faster triu, tril, where with scalar (#2644)
zhangyiss synced new reference col-reduce-small to zhangyiss/mlx from mirror 2025-10-02 20:30:50 +08:00
zhangyiss synced commits to col-reduce-small at zhangyiss/mlx from mirror 2025-10-02 20:30:49 +08:00
zhangyiss synced and deleted reference refs/tags/accelerate-vs-neon at zhangyiss/mlx from mirror 2025-10-02 11:58:10 +08:00
zhangyiss synced commits to main at zhangyiss/mlx from mirror 2025-10-02 11:58:10 +08:00
e88f2d4a8e fix cross entropy axis param (#2641)
9cee557423 Fix status message (#2638)
Compare 2 commits »
zhangyiss synced commits to main at zhangyiss/mlx from mirror 2025-10-01 11:18:12 +08:00
bbf1423953 wait for tasks in cuda (#2636)
eb24267b56 Compile now can attach arbitrary data to an entry (#2634)
Compare 2 commits »