Faster contiguous gather for indices in the first axis (#2552)

* faster contiguous gather for indices in the first axis

* work per thread > 1

* angelos suggestion for scales / biases
This commit is contained in:
Awni Hannun
2025-08-28 21:26:30 -07:00
committed by GitHub
parent 827003d568
commit 111f1e71af
10 changed files with 97 additions and 33 deletions

View File

@@ -19,6 +19,7 @@ const char* binary_two();
const char* copy();
const char* fft();
const char* gather_axis();
const char* gather_front();
const char* hadamard();
const char* logsumexp();
const char* quantized_utils();