CRTP-base kernel providing software scheduling of virtual blocks, reducing loading overhead.
More...
|
| BlockChain1DKernel (int stepsPerBlock, int length, uint16_t remainderBlocks) |
|
__device__ int | getStepsThisBlock () |
| Query the number of times this block will run the main loop. More...
|
|
__device__ void | run () |
|
__device__ void | init () |
| Called before any calls to processBlock() . Overload if you want to do something then. More...
|
|
__device__ void | finish () |
| Called after all calls to processBlock() . Overload if you want to do something then. More...
|
|
__device__ void | operator() () |
|
__device__ bool | isLastBlockHere (__device int *atomicCounter) |
| A handy mechanism for determining if a block is the last one to reach a certain point. More...
|
|
__host__ void | launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, sp::Stream &stream, int dynamicSMem=0) |
| Do the kernel launch. More...
|
|
__host__ void | launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, cudaStream_t stream, int dynamicSMem=0) |
| Evil legacy wrapper that allows using bare streams. More...
|
|
template<typename Subclass, typename KGrid, int blockStride>
class sp::BlockChain1DKernel< Subclass, KGrid, blockStride >
CRTP-base kernel providing software scheduling of virtual blocks, reducing loading overhead.
Typically used to apply an operation to a number of chunks of data- instead of launching a number of blocks linear in the input size, this kernel can launch the best number for the hardware, while providing a very similar interface to the user code:
Subclass
should contain a member function template with this signature:
template<bool BoundsCheck = false>
__device__ void processBlock(int blockOffset);
Where blockOffset
is an abstract 1D offset to the start of the "region" to be processed by the block, replacing the typical blockIdx
, and BoundsCheck
is true iff the region of interest does not contain the end of the array. Passed as a template parameter to allow efficient code generation for non-bounds-checked blocks-typically the majority.