RoPE for CUDA (#2293)

* First working CUDA rope

* Fix random
This commit is contained in:
Angelos Katharopoulos
2025-06-15 06:08:07 -07:00
committed by GitHub
parent a14aaa7c9d
commit 580776559b
8 changed files with 443 additions and 29 deletions

View File

@@ -121,6 +121,7 @@ dim3 get_2d_grid_dims(
const Shape& shape,
const Strides& strides,
size_t divisor);
std::pair<dim3, dim3> get_grid_and_block(int dim0, int dim1, int dim2);
// Return a block size that achieves maximum potential occupancy for kernel.
template <typename T>