Commit Graph

38 Commits

Author SHA1 Message Date
Ronan Collobert
b48d298205 WIP (distributed) 2025-10-31 13:20:09 -07:00
Ronan Collobert
a1212b4e44 WIP (distributed) 2025-10-30 16:25:11 -07:00
Ronan Collobert
8f8af61a37 fix warnings showing up with -Wall (#2692) 2025-10-24 11:43:35 -07:00
Anastasiia Filippova
e9eab527eb Nccl timeout (#2673)
* print the error & delete nccl group

* timeout for nccl binding

* typo

* revert error

* fixed a typo
2025-10-14 12:29:54 -07:00
Awni Hannun
343e33b6d5 fix all_gather vjp (#2654) 2025-10-07 06:05:23 -07:00
Awni Hannun
af120c2bc0 set nccl ABI version (#2587) 2025-09-11 16:55:53 -07:00
Awni Hannun
584d48458e link with nccl (#2546) 2025-08-27 10:01:07 -07:00
Awni Hannun
d2f540f4e0 Use nccl header only when nccl is not present (#2539)
* use nccl header only when nccl is not present

* larger machine for cuda build
2025-08-25 14:17:25 -07:00
Awni Hannun
068a4612e9 nccl default for backend=any (#2528)
* nccl default for backend=any

* check num gpus + ensure row contiguous for all reduce

* comment
2025-08-22 12:24:27 -07:00
Anastasiia Filippova
9392fc3f88 NCCL backend (#2476) 2025-08-21 11:56:15 -07:00
Cheng
d34f887abc Add Primitive::name and remove Primitive::print (#2365) 2025-07-14 14:06:35 -07:00
Awni Hannun
f5f65ef48c Make sliceUpdate general (#2282)
* Make sliceUpdate general

* fix
2025-06-12 16:48:54 -07:00
Angelos Katharopoulos
ddaa4b7dcb Fix the test and add custom min/max reductions for uncommon MPI types (#2060) 2025-04-10 17:01:17 -07:00
Anastasiia Filippova
515f104926 Min / max reductions (#2041) 2025-04-09 23:22:20 -07:00
Jesper Stemann Andersen
9307b2ab8b Fixed 32-bit platform support for distributed/ring implementation (#1996)
Replaced unsigned long integer literals with size_t literals in ring implementation, e.g., 1UL with size_t(1).
2025-03-24 08:08:40 -07:00
Jesper Stemann Andersen
522d8d3917 Added missing netinet/in.h include that fixes build on FreeBSD (#1997)
Defines IPPROTO_TCP.
2025-03-24 08:07:34 -07:00
Angelos Katharopoulos
69e4dd506b Add a ring all gather (#1985) 2025-03-21 13:36:51 -07:00
Angelos Katharopoulos
25814a9458 Disable mpi on version mismatch (#1989) 2025-03-21 13:36:26 -07:00
Angelos Katharopoulos
d343782c8b Cross platform libmpi loading (#1975) 2025-03-21 11:23:10 -07:00
Awni Hannun
c4230747a1 redesign for faster cpu/gpu synch (#1869)
* redesign for faster cpu/gpu synch

* load + more async CPU

* use command encoder API and move more ops to use it

* make fence back-end generic + CPU only fence

* faster build

* fix async eval

* fixes + handle temporaries

* fix / improve cpu conv

* remove unused status, fix siblings

* fix extensions

* fix

* fix no cpu build

* format

* comments

* fix perf regression, remove unecessary abort

* fix events, task limit cpu

* fix waiting

* fix donation / temporaries in normalization
2025-03-06 19:23:38 -08:00
Angelos Katharopoulos
0792ff02ff Only fail when 10 consecutive socket errors occur (#1928) 2025-03-05 13:16:19 -08:00
Angelos Katharopoulos
6bf00ef631 Fix ring of 2 and allow scalars in API (#1906) 2025-02-25 17:03:01 -08:00
Angelos Katharopoulos
10b271d963 Ring update (#1885) 2025-02-20 14:32:31 -08:00
Awni Hannun
1c0c118f7c Fp64 on the CPU (#1843)
* add fp64 data type

* clean build

* update docs

* fix bug
2025-02-07 15:52:22 -08:00
Awni Hannun
1156c84e86 Refactor common into cpu specific and truly common (#1817)
* refactor

* fix extension example

* fix no-cpu
2025-02-03 15:58:02 -08:00
Jesper Stemann Andersen
2d8e667400 MinGW support (#1806)
* Changed /bin/bash to bash for generating compiling preamble

* Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS

* Solved ambiguity wrt. bernoulli test shape

* Disabled distributed/ring on Windows

* Fixed jit_compiler command wrt. MinGW

* Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD
2025-02-01 12:40:06 -08:00
Angelos Katharopoulos
ccb61d7aae Ring distributed backend (#1784) 2025-01-27 22:15:01 -08:00
Angelos Katharopoulos
545f84d905 Refactor distributed backend (#1752) 2025-01-06 17:33:15 -08:00
Awni Hannun
058d6ce683 mpi send use input as output (#1750)
* mpi send use input as output

* move earlier
2025-01-06 06:08:43 -08:00
Awni Hannun
e03f0372b1 More shape type (#1705)
* more shape type

* fix
2024-12-19 08:08:20 -08:00
Angelos Katharopoulos
914409fef9 Data parallel helper (#1407) 2024-09-16 18:17:21 -07:00
Nripesh Niketan
669c27140d Chore: add pre-commit hook for cmake (#1362)
* reset and lint

* format

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-16 12:53:01 -07:00
Awni Hannun
b3f52c9fbe ensure io/comm streams are active before eval (#1412) 2024-09-14 06:17:36 -07:00
Angelos Katharopoulos
cdb59faea6 Adds send/recv ops in distributed (#1366) 2024-08-26 23:01:37 -07:00
Awni Hannun
5f7d19d1f5 MPI ops in GPU stream for faster comms (#1356) 2024-08-26 15:12:50 -07:00
Angelos Katharopoulos
0163a8e57a Add docs for the distributed namespace (#1184) 2024-06-06 11:37:00 -07:00
Angelos Katharopoulos
3de8ce3f3c In place all-reduce and forgiving init (#1178) 2024-06-03 16:47:47 -07:00
Angelos Katharopoulos
50dfb664db Comms (#1097)
* Start the communications branch using MPI
* Add ops and primitives
* Add python bindings for distributed
2024-05-23 17:04:02 -07:00