CRTP-base kernel providing software scheduling of virtual blocks, reducing loading overhead.
More...
|
| | BlockChain1DKernel (int stepsPerBlock, int length, uint16_t remainderBlocks) |
| |
| __device__ int | getStepsThisBlock () |
| | Query the number of times this block will run the main loop. More...
|
| |
|
__device__ void | run () |
| |
| __device__ void | init () |
| | Called before any calls to processBlock(). Overload if you want to do something then. More...
|
| |
| __device__ void | finish () |
| | Called after all calls to processBlock(). Overload if you want to do something then. More...
|
| |
|
__device__ void | operator() () |
| |
| __device__ bool | isLastBlockHere (__device int *atomicCounter) |
| | A handy mechanism for determining if a block is the last one to reach a certain point. More...
|
| |
| __host__ void | launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, sp::Stream &stream, int dynamicSMem=0) |
| | Do the kernel launch. More...
|
| |
| __host__ void | launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, cudaStream_t stream, int dynamicSMem=0) |
| | Evil legacy wrapper that allows using bare streams. More...
|
| |
template<typename Subclass, typename KGrid, int blockStride>
class sp::BlockChain1DKernel< Subclass, KGrid, blockStride >
CRTP-base kernel providing software scheduling of virtual blocks, reducing loading overhead.
Typically used to apply an operation to a number of chunks of data- instead of launching a number of blocks linear in the input size, this kernel can launch the best number for the hardware, while providing a very similar interface to the user code:
Subclass should contain a member function template with this signature:
template<bool BoundsCheck = false>
__device__ void processBlock(int blockOffset);
Where blockOffset is an abstract 1D offset to the start of the "region" to be processed by the block, replacing the typical blockIdx, and BoundsCheck is true iff the region of interest does not contain the end of the array. Passed as a template parameter to allow efficient code generation for non-bounds-checked blocks-typically the majority.