Commit Graph - mlx - Gitea for Geophysics

zhangyiss/mlx

Fork 0

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Commit Graph

Select branches

Hide Pull Requests

cpp20

cuda-sdpa-vector

fft

gguf_q4_k

gh-pages

ibv-backend

ibv-backend-test

interrupt_eval

jagrit06/cuda-gemm-experiment

jit-nax

main

q-sdpa

qmm

ring-init

sdpa-test

sdpav-backup

sign-warns

simple-gemm

split_logsumexp

steel-refactor

#1

#1000

#1002

#1003

#1006

#1007

#1010

#1011

#1014

#1016

#1018

#1019

#102

#1020

#1028

#1030

#1032

#1034

#1035

#1036

#1037

#1038

#1039

#104

#1043

#1053

#1054

#1058

#1059

#1060

#1061

#1064

#1067

#1070

#1074

#1077

#1079

#108

#1081

#1085

#1087

#109

#1091

#1092

#1093

#1097

#1098

#1099

#11

#110

#1100

#1101

#1102

#1104

#1105

#1109

#111

#1110

#1111

#1112

#1113

#1114

#1115

#1116

#1117

#1118

#1119

#1120

#1122

#1123

#1124

#1125

#1126

#1129

#1131

#1132

#1135

#1136

#1137

#1138

#1139

#1140

#1141

#1142

#1147

#1149

#115

#1150

#1151

#1152

#1154

#1157

#116

#1161

#1165

#1167

#1168

#1169

#117

#1172

#1174

#1175

#1176

#1177

#1178

#1179

#118

#1180

#1183

#1184

#1185

#1188

#1189

#119

#1190

#1191

#1194

#1195

#1199

#120

#1200

#1202

#1203

#1204

#1205

#1206

#1208

#1209

#121

#1211

#1212

#1215

#1216

#122

#1221

#1222

#1224

#1227

#1228

#123

#1235

#1236

#1237

#1239

#1242

#1243

#1245

#1246

#1247

#1249

#125

#1252

#1253

#1256

#1260

#1262

#1262

#1263

#1264

#1266

#1268

#1269

#1270

#1273

#1274

#1275

#1278

#1279

#128

#1280

#1281

#1282

#1283

#1285

#1287

#1289

#1291

#1297

#1298

#1299

#1300

#1301

#1304

#1305

#1306

#1307

#1309

#131

#1310

#1314

#1315

#1316

#1318

#1319

#1320

#1323

#1325

#1326

#1327

#1329

#133

#1330

#1332

#1333

#1334

#1336

#1337

#1339

#1340

#1343

#1344

#1346

#1347

#1348

#1349

#1350

#1351

#1352

#1353

#1355

#1356

#1358

#1359

#136

#1360

#1361

#1362

#1365

#1366

#1367

#1368

#1369

#137

#1371

#1372

#1373

#1374

#1376

#1379

#138

#1381

#1383

#1384

#1385

#1387

#1389

#139

#1390

#1391

#1394

#1395

#1396

#1397

#1401

#1402

#1403

#1404

#1405

#1407

#1408

#1410

#1412

#1414

#1415

#1416

#1417

#1418

#1419

#142

#1420

#1421

#1425

#143

#1430

#1431

#1434

#1436

#1437

#144

#1440

#1442

#1444

#1445

#1446

#1447

#1449

#145

#1450

#1451

#1452

#1453

#1455

#1456

#1460

#1461

#1462

#1468

#1470

#1471

#1476

#1477

#1478

#1479

#1482

#1485

#1486

#1488

#149

#1490

#1491

#1492

#1493

#1495

#1496

#1497

#1498

#150

#1501

#1502

#1503

#1506

#1508

#1509

#1510

#1514

#1515

#1515

#1518

#1519

#1521

#1522

#1523

#1524

#1525

#1526

#1528

#1529

#1532

#1534

#1535

#1537

#1539

#1541

#1543

#1545

#1546

#1548

#1550

#1551

#1553

#1555

#1556

#1557

#1558

#156

#1561

#1562

#1563

#1564

#1565

#1566

#1568

#1569

#157

#1570

#1572

#1574

#1575

#1577

#1578

#1579

#158

#1584

#1587

#1589

#159

#1590

#1591

#1594

#1595

#1596

#1597

#1600

#1601

#1603

#1606

#1607

#1609

#161

#1610

#1612

#1613

#1615

#1616

#1617

#1620

#1625

#1626

#1627

#1628

#1629

#1630

#1632

#1634

#1635

#1637

#1638

#1639

#1640

#1642

#1644

#1645

#1646

#1650

#1651

#1652

#1653

#1654

#1655

#1656

#1657

#1658

#1659

#166

#1660

#1661

#1662

#1663

#1664

#1665

#1666

#1667

#1668

#167

#1671

#1672

#1673

#1674

#1675

#1677

#1678

#1679

#1680

#1681

#1682

#1683

#1684

#1685

#1687

#1688

#1689

#1690

#1691

#1692

#1693

#1694

#1695

#1696

#1697

#1698

#1699

#170

#1700

#1701

#1702

#1704

#1705

#1706

#1708

#1709

#1710

#1714

#1715

#1716

#1718

#1719

#1721

#1722

#1723

#1724

#1726

#1727

#1728

#1731

#1732

#1733

#1735

#1736

#1737

#1738

#1740

#1741

#1742

#1743

#1745

#1746

#1747

#1748

#1749

#1750

#1752

#1753

#1754

#1755

#1756

#1757

#1758

#1759

#1760

#1761

#1762

#1763

#1764

#1765

#1768

#1772

#1773

#1774

#1775

#1776

#1777

#1782

#1783

#1784

#1788

#1789

#1789

#1793

#1795

#1797

#1798

#1799

#1801

#1802

#1803

#1805

#1806

#181

#1810

#1811

#1812

#1813

#1814

#1816

#1817

#1819

#1820

#1822

#1825

#1827

#1829

#183

#1830

#1831

#1833

#1834

#1835

#1836

#1837

#1838

#184

#1840

#1843

#1844

#1845

#1848

#185

#1852

#1854

#1856

#1857

#1858

#1859

#186

#1860

#1861

#1862

#1863

#1864

#1865

#1866

#1867

#1869

#187

#1870

#1874

#1875

#1876

#1879

#1882

#1883

#1884

#1885

#1887

#1889

#189

#1890

#1892

#1894

#1896

#1897

#1898

#1899

#190

#1900

#1901

#1902

#1904

#1906

#1911

#1913

#1914

#1915

#1916

#1917

#1920

#1921

#1922

#1923

#1924

#1925

#1926

#1928

#1929

#1931

#1932

#1935

#1936

#1937

#1938

#1939

#1940

#1943

#1944

#1948

#1949

#195

#1950

#1952

#1953

#1955

#1957

#196

#1961

#1962

#1966

#1968

#1969

#1970

#1970

#1972

#1973

#1974

#1975

#1976

#1978

#198

#1980

#1981

#1982

#1983

#1985

#1986

#1987

#1988

#1989

#199

#1990

#1991

#1992

#1995

#1996

#1997

#1998

#1999

#2

#2000

#2004

#2005

#2006

#2007

#2008

#2009

#2011

#2012

#2013

#2014

#2016

#2017

#2018

#202

#2020

#2021

#2024

#2025

#2026

#2027

#2028

#2029

#203

#2031

#2032

#2033

#2035

#2036

#2040

#2041

#2042

#2043

#2044

#2045

#2046

#2047

#2048

#2049

#205

#2051

#2052

#2053

#2054

#2055

#2058

#2059

#2060

#2061

#2062

#2065

#2066

#2068

#2069

#207

#2070

#2071

#2072

#2073

#2074

#2074

#2075

#2078

#2079

#2080

#2081

#2082

#2087

#209

#2090

#2091

#2092

#2094

#2095

#210

#2100

#2101

#2102

#2104

#2104

#2109

#211

#2110

#2114

#2117

#2119

#2121

#2123

#2128

#2129

#2131

#2135

#2136

#2138

#2139

#2141

#2142

#2143

#2145

#2147

#2148

#215

#2150

#2151

#2152

#2153

#2156

#2156

#2157

#2158

#2160

#2161

#2162

#217

#2172

#2173

#2176

#2177

#2178

#2179

#2181

#2182

#2183

#2187

#2188

#2189

#2191

#2192

#2193

#2195

#22

#2201

#2202

#2204

#2206

#2207

#2209

#221

#2210

#2213

#2214

#2215

#2216

#2217

#2219

#222

#2220

#2221

#2223

#2225

#2226

#2230

#2231

#2232

#2234

#2237

#2239

#224

#2240

#2241

#2242

#2243

#2244

#2246

#2247

#2248

#225

#2250

#2255

#2256

#2258

#2259

#226

#2260

#2261

#2262

#2263

#2264

#2265

#2268

#2269

#227

#2270

#2271

#2272

#2274

#2275

#2276

#2277

#228

#2280

#2282

#2283

#2284

#2286

#2287

#2288

#2289

#2290

#2291

#2293

#2294

#2295

#2296

#2297

#2297

#2298

#2299

#230

#2300

#2300

#2302

#2303

#2304

#2306

#2307

#2308

#2311

#2314

#2316

#2317

#2318

#232

#2320

#2321

#2322

#2323

#2324

#2325

#2326

#2327

#2328

#2329

#233

#2330

#2331

#2335

#2336

#2339

#2340

#2341

#2342

#2343

#2345

#2346

#2347

#235

#2350

#2351

#2352

#2354

#2355

#2356

#2357

#2360

#2361

#2362

#2363

#2364

#2365

#2367

#2368

#2370

#2371

#2372

#2375

#2377

#2378

#2379

#2380

#2382

#2383

#2385

#2386

#2387

#2388

#2389

#2392

#2396

#2397

#2398

#2399

#240

#2400

#2401

#2404

#2406

#2407

#2408

#2409

#2411

#2412

#2413

#2414

#2415

#2416

#2417

#2419

#2420

#2421

#2423

#2424

#2425

#2426

#2427

#2429

#2430

#2431

#2432

#2433

#2434

#2435

#2437

#2438

#2439

#244

#2440

#2441

#2442

#2443

#2444

#2445

#2446

#2447

#2448

#2449

#245

#2450

#2453

#2454

#2455

#2460

#2461

#2462

#2463

#2464

#2465

#2466

#2468

#247

#2470

#2471

#2472

#2473

#2474

#2476

#2477

#2480

#2482

#2483

#2484

#2485

#2486

#2487

#2488

#2489

#249

#2491

#2493

#2494

#2495

#2496

#2499

#250

#2502

#2505

#2506

#2508

#2510

#2511

#2513

#2514

#2515

#2517

#2518

#252

#2520

#2521

#2523

#2524

#2525

#2526

#2527

#2528

#2530

#2531

#2532

#2533

#2534

#2535

#2539

#254

#2541

#2542

#2543

#2544

#2545

#2546

#2548

#2549

#255

#2550

#2551

#2552

#2553

#2554

#2555

#2557

#2558

#2559

#256

#2560

#2562

#2563

#2564

#2565

#2567

#2568

#2569

#2570

#2571

#2572

#2573

#2574

#2576

#2577

#2578

#2580

#2581

#2582

#2584

#2586

#2587

#2588

#2591

#2592

#2594

#2595

#2598

#26

#260

#2600

#2601

#2602

#2603

#2603

#2604

#2606

#2608

#2609

#261

#2611

#2612

#2613

#2614

#2618

#2619

#2619

#2620

#2621

#2622

#2627

#263

#2630

#2631

#2633

#2634

#2636

#2638

#2641

#2642

#2644

#2645

#2646

#2648

#2649

#2650

#2652

#2653

#2654

#2656

#2657

#2658

#2659

#2661

#2662

#2663

#2666

#2667

#2669

#2671

#2672

#2673

#2678

#2679

#268

#2680

#2682

#2684

#2686

#2687

#2688

#2689

#2690

#2692

#2694

#2697

#2699

#2700

#2701

#2702

#2704

#2705

#2706

#2706

#2707

#2709

#2713

#2715

#2716

#2717

#2718

#2719

#2720

#2721

#2722

#2723

#2723

#2725

#2726

#2727

#2730

#2731

#2733

#2734

#2736

#2737

#2739

#274

#2740

#2741

#2743

#2746

#275

#2750

#2751

#2752

#2753

#2754

#2756

#2757

#2758

#2759

#276

#2760

#2761

#2762

#2763

#2764

#2765

#2767

#2769

#277

#2771

#2772

#2773

#2774

#2775

#2776

#2777

#2778

#278

#2780

#2781

#2782

#2783

#2784

#2785

#2786

#2787

#2788

#2789

#2789

#2790

#2792

#2796

#2796

#2797

#2798

#2799

#280

#2800

#2802

#2804

#2805

#2808

#2808

#2809

#2809

#281

#2810

#2811

#2813

#2814

#2815

#2817

#2817

#2818

#2819

#2820

#2822

#2823

#2824

#2825

#2826

#2828

#2830

#2831

#2832

#2833

#2836

#2838

#284

#2843

#2845

#2847

#2850

#2852

#2853

#2853

#2854

#2857

#2859

#2860

#2860

#2861

#2862

#2862

#2863

#2866

#2868

#2869

#2870

#2871

#2872

#2873

#2874

#2875

#2877

#2882

#2883

#2885

#2885

#2886

#289

#2890

#2891

#2892

#2893

#2896

#2897

#2899

#29

#2901

#2902

#2902

#2903

#2904

#2904

#2905

#2906

#2906

#2909

#2910

#2910

#2911

#2912

#292

#295

#298

#299

#3

#302

#304

#305

#306

#307

#308

#309

#310

#311

#312

#313

#315

#316

#317

#318

#319

#32

#323

#324

#325

#327

#329

#330

#332

#334

#335

#336

#337

#339

#34

#340

#342

#344

#345

#347

#348

#349

#350

#352

#353

#354

#355

#356

#357

#358

#359

#36

#364

#366

#370

#371

#373

#374

#375

#379

#38

#380

#381

#382

#383

#384

#385

#386

#388

#389

#390

#391

#392

#394

#395

#397

#398

#4

#401

#405

#406

#409

#411

#412

#414

#415

#418

#419

#421

#423

#424

#425

#426

#427

#428

#430

#431

#432

#433

#435

#438

#441

#443

#444

#445

#446

#447

#448

#449

#453

#455

#456

#457

#458

#461

#462

#463

#47

#470

#472

#473

#475

#476

#477

#478

#479

#480

#481

#482

#484

#489

#490

#492

#494

#497

#498

#5

#50

#500

#501

#505

#507

#508

#510

#511

#512

#513

#514

#517

#519

#520

#521

#523

#524

#525

#526

#527

#53

#533

#536

#537

#539

#541

#543

#55

#552

#554

#558

#559

#56

#560

#561

#562

#563

#564

#565

#569

#571

#579

#58

#581

#584

#588

#59

#591

#592

#595

#599

#6

#601

#602

#603

#604

#607

#608

#612

#613

#614

#616

#619

#62

#620

#622

#623

#624

#625

#626

#627

#629

#630

#631

#634

#635

#637

#638

#639

#64

#641

#643

#645

#647

#648

#651

#653

#656

#658

#659

#660

#661

#662

#663

#664

#667

#670

#674

#675

#676

#677

#678

#679

#68

#681

#682

#683

#684

#685

#686

#687

#688

#689

#69

#691

#692

#694

#696

#697

#698

#699

#7

#70

#702

#703

#704

#705

#706

#707

#708

#709

#710

#711

#713

#715

#716

#717

#718

#72

#721

#723

#727

#729

#730

#735

#737

#739

#74

#740

#744

#745

#747

#749

#75

#752

#759

#760

#761

#763

#764

#766

#768

#771

#776

#777

#778

#78

#78

#782

#783

#784

#785

#786

#787

#788

#79

#790

#791

#792

#793

#794

#796

#797

#8

#80

#800

#801

#802

#803

#804

#805

#806

#807

#809

#81

#812

#813

#816

#818

#819

#820

#821

#822

#823

#824

#826

#828

#829

#831

#835

#836

#838

#839

#84

#841

#843

#844

#848

#849

#85

#850

#852

#853

#858

#859

#861

#862

#863

#864

#867

#869

#87

#870

#871

#872

#874

#875

#876

#877

#879

#88

#880

#881

#883

#886

#889

#89

#890

#891

#892

#893

#894

#895

#897

#899

#9

#90

#901

#902

#903

#904

#905

#906

#907

#91

#911

#915

#916

#917

#919

#92

#920

#921

#923

#924

#925

#926

#929

#932

#933

#934

#94

#940

#941

#942

#944

#947

#948

#949

#950

#951

#952

#953

#955

#956

#957

#958

#960

#961

#962

#964

#967

#969

#970

#971

#972

#973

#974

#975

#976

#977

#978

#979

#98

#980

#981

#982

#984

#986

#987

#988

#989

#99

#991

#992

#993

#994

#998

v0.0.10

v0.0.11

v0.0.2

v0.0.3

v0.0.4

v0.0.5

v0.0.6

v0.0.7

v0.0.9

v0.1.0

v0.10.0

v0.11.0

v0.11.1

v0.12.0

v0.12.1

v0.12.2

v0.13.0

v0.13.1

v0.14.0

v0.14.1

v0.15.0

v0.15.1

v0.15.2

v0.16.0

v0.16.1

v0.16.2

v0.16.3

v0.17.0

v0.17.1

v0.17.2

v0.17.3

v0.18.0

v0.18.1

v0.19.0

v0.19.1

v0.19.2

v0.19.3

v0.2.0

v0.20.0

v0.21.0

v0.21.1

v0.22.0

v0.22.1

v0.23.0

v0.23.1

v0.23.2

v0.24.0

v0.24.1

v0.24.2

v0.25.0

v0.25.1

v0.25.2

v0.26.0

v0.26.1

v0.26.2

v0.26.3

v0.26.5

v0.27.1

v0.28.0

v0.29.0

v0.29.1

v0.29.2

v0.29.3

v0.29.4

v0.3.0

v0.30.0

v0.4.0

v0.5.0

v0.5.1

v0.6.0

v0.7.0

v0.8.0

v0.8.1

v0.9.0

v0.9.1

1a9f820af6 Compiled should not end in broadcast (#2622) Angelos Katharopoulos 2025-09-26 13:36:09 -07:00
d4f4ff3c5e Allow None input to compiled functions (#2621) Awni Hannun 2025-09-25 08:42:23 -07:00
7c7e48dbd1 New tuning for small K gemv (#2620) Jagrit Digani 2025-09-23 12:28:35 -07:00
fbbf3b9b3e Support pickling array for bfloat16 (#2586) Daniel Yeh 2025-09-23 05:12:15 +02:00
bf01ad9367 fix (#2613) Daniel Yeh 2025-09-23 05:12:04 +02:00
ae438d05fa [CUDA] Recycle CUDA events (#2604) Cheng 2025-09-23 10:42:03 +09:00
711a645807 avoid producing NaN in attention (#2608) Awni Hannun 2025-09-22 13:10:43 -07:00
aa9d44b3d4 implement Convolution::output_shape (#2601) Josh Bleecher Snyder 2025-09-22 10:09:45 -07:00
ec2ab42888 Lower sorted QMM gather threshold (#2609) Awni Hannun 2025-09-19 18:22:55 -07:00
787c0d90cd Detect cache thrashing in LRUCache (#2600) Cheng 2025-09-19 09:12:14 +09:00
e8b604a6a3 fix: library loading for swift dynamic frameworks (#2568) Oleksandr Bilous 2025-09-18 23:54:59 +03:00
50cc09887f expose depends (#2606) Awni Hannun 2025-09-18 10:06:15 -07:00
3f730e77aa Update export function example for array input (#2598) Umberto Mignozzetti 2025-09-16 14:38:05 -07:00
caecbe876a no copy batch rope (#2595) Awni Hannun 2025-09-15 14:23:48 -07:00
8afb6d62f2 Fix typo in average_gradients function call (#2594) Umberto Mignozzetti 2025-09-15 11:29:21 -07:00
6ccfa603cd fix metal scan (#2591) Awni Hannun 2025-09-15 11:01:57 -07:00
36cad99a11 Refactor code examples to use 'gelu' (#2592) Umberto Mignozzetti 2025-09-15 09:47:02 -07:00
ee18e1cbf0 patch bump (#2588) v0.29.1 Awni Hannun 2025-09-11 17:10:09 -07:00
af120c2bc0 set nccl ABI version (#2587) Awni Hannun 2025-09-11 16:55:53 -07:00
6a3acf2301 [CUDA] Set bias as input when using bias epilogue (#2584) Cheng 2025-09-11 15:31:09 +09:00
d6977f2a57 Add sdpa with sinks (#2558) Awni Hannun 2025-09-10 14:53:00 -07:00
db5443e831 Adding Relu2 (#2582) Gökdeniz Gülmez 2025-09-10 16:24:30 +02:00
52b8384d10 Fix flaky addmm tests (#2581) Cheng 2025-09-10 14:22:22 +09:00
44cc5da4bc [CUDA] Fix alpha not respected when using bias epilogue (#2578) Cheng 2025-09-10 09:08:01 +09:00
dde3682b69 [CUDA] Use GEMM with epilogue instead of AddMM (#2569) Cheng 2025-09-09 13:18:49 +09:00
17310d91a6 Add batch offsets for mx.fast.rope (#2564) Awni Hannun 2025-09-08 17:35:07 -07:00
b194d65a6a Some tweaks in cmake files (#2574) Cheng 2025-09-09 08:27:18 +09:00
a44b27f5f8 Fix a few ccache cache miss (#2573) Cheng 2025-09-09 07:41:05 +09:00
e5a33f2223 faster depthwise 1D conv (#2567) Awni Hannun 2025-09-08 11:37:23 -07:00
c1e3340b23 Set ccache size before building (#2570) Cheng 2025-09-07 09:00:31 +09:00
8f163a367d typing: add type hints to mlx.core.array, linalg, distributed, and random (#2565) XXXXRT666 2025-09-05 00:08:11 +08:00
89a3df9014 Fixed several type annotations in the MLX stubs which degraded to Unknown/Any (#2560) Manuel Villanueva 2025-09-03 14:52:08 -05:00
c5d2937aa5 chore: Update Docs With Slice Copy Example (#2559) Krishi Saripalli 2025-09-02 22:07:02 -07:00
b61a65e313 fix copies in sdpa (#2563) Awni Hannun 2025-09-02 11:00:36 -07:00
04cbb4191c Fix dequantize python sig (#2562) wrmsr 2025-09-01 11:50:20 -07:00
c5460762e7 Fix AdamW weight_decay default value in docstring (#2557) Artur Antonov 2025-09-01 07:29:30 +03:00
8ce49cd39e fix quantized vjp for mxfp4 (#2555) v0.29.0 Awni Hannun 2025-08-29 10:06:15 -07:00
9c68b50853 version bump (#2554) Awni Hannun 2025-08-29 06:54:17 -07:00
111f1e71af Faster contiguous gather for indices in the first axis (#2552) Awni Hannun 2025-08-28 21:26:30 -07:00
827003d568 fix METAL quantization in JIT (#2553) Awni Hannun 2025-08-28 18:26:25 -07:00
d363a76aa4 Bump xcode in circle (#2551) Awni Hannun 2025-08-28 13:13:34 -07:00
70560b6bd5 Add mode parameter for quantization (#2499) Awni Hannun 2025-08-28 06:45:26 -07:00
7ef8a6f2d5 [CUDA] fix sort (#2550) Awni Hannun 2025-08-27 19:48:43 -07:00
31c6f6e33f [CUDA] Use ConcurrentContext in concatenate_gpu (#2549) Cheng 2025-08-28 09:30:08 +09:00
584d48458e link with nccl (#2546) Awni Hannun 2025-08-27 10:01:07 -07:00
5cf984ca87 Separate cpu compilation cache by versions (#2548) Cheng 2025-08-27 11:25:15 +09:00
a9bac3d9e5 Run CPP tests for CUDA build in CI (#2544) Cheng 2025-08-27 08:06:46 +09:00
5458d43247 add load with path tests (#2543) Awni Hannun 2025-08-26 14:24:47 -07:00
a4dba65220 Enable cuda graph toggle (#2545) Awni Hannun 2025-08-26 12:50:38 -07:00
4987e7615a Improve the cutlass gemm simple-gemm Angelos Katharopoulos 2025-08-25 18:18:19 -07:00
3dcb286baf Remove stream from average grads so it uses default (#2532) Awni Hannun 2025-08-25 15:56:29 -07:00
4822c3dbe9 [CUDA] Implement DynamicSlice/DynamicSliceUpdate (#2533) Cheng 2025-08-26 07:31:39 +09:00
2ca75bb529 Remove nccl install in release (#2542) Awni Hannun 2025-08-25 15:20:18 -07:00
db14e29a0b allow pathlib.Path to save/load functions (#2541) Awni Hannun 2025-08-25 14:58:49 -07:00
d2f540f4e0 Use nccl header only when nccl is not present (#2539) Awni Hannun 2025-08-25 14:17:25 -07:00
333ffea273 [CUDA] Remove thrust in arange (#2535) Cheng 2025-08-24 16:22:36 +09:00
f55b6f1f2f Enable COMPILE_WARNING_AS_ERROR for linux builds in CI (#2534) Cheng 2025-08-24 15:33:08 +09:00
30561229c7 Fix allocation bug in NCCL (#2530) Awni Hannun 2025-08-22 14:39:43 -07:00
068a4612e9 nccl default for backend=any (#2528) Awni Hannun 2025-08-22 12:24:27 -07:00
5722c147de [CUDA] Update calls to cudaMemAdvise and cudaGraphAddDependencies for CUDA 13 (#2525) Andrey Portnoy 2025-08-21 22:57:20 -04:00
f6819a1f26 Fix warning 186-D from nvcc (#2527) Cheng 2025-08-22 10:29:55 +09:00
f93f87c802 nccl dep + default for cuda (#2526) Awni Hannun 2025-08-21 17:57:49 -07:00
9392fc3f88 NCCL backend (#2476) Anastasiia Filippova 2025-08-21 20:56:15 +02:00
e843c4d8d5 fix power (#2523) Awni Hannun 2025-08-21 06:46:01 -07:00
e1303f6160 Reset cutlass gemm to working state again Angelos Katharopoulos 2025-08-21 01:29:43 -07:00
cf5eef095d tmp Angelos Katharopoulos 2025-08-14 12:29:53 -07:00
395d582719 Add a cutlass gemm Angelos Katharopoulos 2025-08-09 22:47:14 -07:00
05583bcd10 More pipelining for the sm_80 gemm Angelos Katharopoulos 2025-08-09 22:46:31 -07:00
6fce01593a Improve gemm Angelos Katharopoulos 2025-08-07 16:13:18 -07:00
97afe40b7b Remove duplicate register tile Angelos Katharopoulos 2025-08-07 00:55:08 -07:00
f70c62d69c Simple gemm example Angelos Katharopoulos 2025-07-29 18:23:40 -07:00
0c5fc63a36 Fix docs omission (#2524) Angelos Katharopoulos 2025-08-20 17:56:06 -07:00
e397177f6e Custom cuda kernel (#2517) Angelos Katharopoulos 2025-08-20 17:20:22 -07:00
f4c8888cbe [CUDA] Fix stride of singleton dims before passing to cuDNN (#2521) Cheng 2025-08-21 08:55:26 +09:00
25c1e03205 Fix overflow in large filter small channels (#2520) Angelos Katharopoulos 2025-08-20 08:03:29 -07:00
512281781c Remove state return from function example in compile documentation (#2518) russellizadi 2025-08-20 03:45:05 -04:00
ac85ddfdb7 [CUDA] Add GEMM-based fallback convolution kernels (#2511) Cheng 2025-08-20 10:06:22 +09:00
65d0d40232 Split cuDNN helpers into a separate header (#2491) Cheng 2025-08-20 09:29:28 +09:00
cea9369610 fix lapack svd (#2515) Awni Hannun 2025-08-18 15:07:59 -07:00
e7c6e1db82 no segfault with uninitialized array.at (#2514) Awni Hannun 2025-08-18 08:33:38 -07:00
c5fcd5b61b fix custom kernel test (#2510) Awni Hannun 2025-08-18 06:45:59 -07:00
1df9887998 Ensure no oob read in gemv_masked (#2508) Angelos Katharopoulos 2025-08-17 08:42:33 -07:00
73f22d6226 Ensure small sort doesn't use indices if not argsort (#2506) Angelos Katharopoulos 2025-08-17 08:42:20 -07:00
c422050ca7 Update cuDNN Frontend to v1.14 (#2505) Cheng 2025-08-17 19:13:01 +09:00
1ba18ff7d9 [CUDA] Fix conv grads with groups (#2495) Cheng 2025-08-16 10:09:18 +09:00
37b440faa8 Clean up code handling both std::vector and SmallVector (#2493) Cheng 2025-08-16 09:01:10 +09:00
888b13ed63 Remove the hack around SmallVector in cpu compile (#2494) Cheng 2025-08-16 08:17:24 +09:00
4abb218d21 The naive_conv_2d is no longer used (#2496) Cheng 2025-08-16 07:57:30 +09:00
6441c21a94 Faster general unary op (#2472) Awni Hannun 2025-08-15 15:04:12 -07:00
400f8457ea Experimenting with a gemm based on the cuda steel utils jagrit06/cuda-gemm-experiment Jagrit Digani 2025-08-14 11:27:50 -07:00
dfb5022eab Rename cu::Matmul to CublasGemm (#2488) Cheng 2025-08-13 09:37:40 +09:00
ac207ce7aa make code blocks copyable (#2480) Daniel Yeh 2025-08-12 21:29:02 +02:00
fce53b61d6 Fix reduce sum/prod overflow (#2477) Abe Leininger 2025-08-12 02:05:33 -05:00
8ae4a76308 Use CMake <4.1 to avoid the nvpl error (#2489) Angelos Katharopoulos 2025-08-12 00:03:42 -07:00
7fde1b6a1e Fix logsumexp/softmax not fused for some cases (#2474) Cheng 2025-08-09 06:07:17 +09:00
aa7b47481a [CUDA] Optimize set_mm_device_pointers for small ndim (#2473) Cheng 2025-08-08 15:23:30 +09:00
56be773610 version (#2470) v0.28.0 Awni Hannun 2025-08-07 00:36:04 -07:00
a9bdd67baa Add CUDA sdpa vector (#2468) Jagrit Digani 2025-08-06 21:40:26 -07:00
a22d0bf273 Add stricter condition to matrix sdpa sdpav-backup Angelos Katharopoulos 2025-08-06 19:51:14 -07:00
f2adb5638d Fix typo in metal command encoder (#2471) Angelos Katharopoulos 2025-08-06 16:58:23 -07:00

... 2 3 4 5 6 ...