CRTP base class for kernels. More...
#include <Kernel.hpp>
Public Types | |
using | Grid = GTraits |
Public Member Functions | |
__device__ void | operator() () |
__device__ bool | isLastBlockHere (__device int *atomicCounter) |
A handy mechanism for determining if a block is the last one to reach a certain point. More... | |
template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>> | |
__host__ void | launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, sp::Stream &stream, int dynamicSMem=0) |
Do the kernel launch. More... | |
template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>> | |
__host__ void | launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, cudaStream_t stream, int dynamicSMem=0) |
Evil legacy wrapper that allows using bare streams. More... | |
CRTP base class for kernels.
By implementing your kernel as an object that inherits from this one, instead of more conventionally as a function, you can:
Additionally, it provides:
GridTraits
block assumption cache, which lets you specify assumptions about the size and dimensionality of your grid in a way that is accessible to the compiler optimiser. It then can constant-propagate arithmetic involving known constant values (instead of either codegenning math with 1s or 0s, or you jumping through elaborate hoops per-kernel to avoid accessing the unused elements of blockIdx and friends). __launch_bounds__
exposed to template metaprogramming.To use this, simply inherit from this (or one of the other kernel base classes such as BlockChain1DKernel
and implement the appropriate function (usually run()
, but some subclasses provide a different and higher level interface).
__device__ bool sp::Kernel< Subclass, GTraits >::isLastBlockHere | ( | __device int * | atomicCounter | ) |
A handy mechanism for determining if a block is the last one to reach a certain point.
Note that if you have multiple barriers using this mechanism, each has to use a different counter, or the behaviour is undefined. This mechanism does not synchronise threads between the blocks, it simply keeps track of the race condition so the "last" block to pass a certain point can take some special action.
atomicCounter | A pointer to an integer in global memory. This must be zero on entry to this function, or the behaviour is undefined. |
__host__ void sp::Kernel< Subclass, GTraits >::launch | ( | sp::Vec< int, Grid::GridRank > | numBlocks, |
sp::Vec< int, Grid::BlockRank > | numThreads, | ||
cudaStream_t | stream, | ||
int | dynamicSMem = 0 |
||
) |
Evil legacy wrapper that allows using bare streams.
This is required for libraries such as spRand that have to provide the C-like interface, so can't route everything through libSCALE.
Do not use this API if you can use the other one. It will not give you the nice error specialisation that is provided by libSCALE streams.
__host__ void sp::Kernel< Subclass, GTraits >::launch | ( | sp::Vec< int, Grid::GridRank > | numBlocks, |
sp::Vec< int, Grid::BlockRank > | numThreads, | ||
sp::Stream & | stream, | ||
int | dynamicSMem = 0 |
||
) |
Do the kernel launch.
Slightly more convenient than the regular way because it supports Vecs and puts the rarely-used dynamic smem option last, defaulted to zero.