Speclib  0.1.2
The library for writing better CUDA libraries
sp::BlockChain1DKernel< Subclass, KGrid, blockStride > Class Template Reference

CRTP-base kernel providing software scheduling of virtual blocks, reducing loading overhead. More...

#include <BlockChain1DKernel.hpp>

Inheritance diagram for sp::BlockChain1DKernel< Subclass, KGrid, blockStride >:
[legend]

Public Member Functions

 BlockChain1DKernel (int stepsPerBlock, int length, uint16_t remainderBlocks)
 
__device__ int getStepsThisBlock ()
 Query the number of times this block will run the main loop. More...
 
__device__ void run ()
 
__device__ void init ()
 Called before any calls to processBlock(). Overload if you want to do something then. More...
 
__device__ void finish ()
 Called after all calls to processBlock(). Overload if you want to do something then. More...
 
- Public Member Functions inherited from sp::Kernel< Subclass, KGrid >
__device__ void operator() ()
 
__device__ bool isLastBlockHere (__device int *atomicCounter)
 A handy mechanism for determining if a block is the last one to reach a certain point. More...
 
__host__ void launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, sp::Stream &stream, int dynamicSMem=0)
 Do the kernel launch. More...
 
__host__ void launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, cudaStream_t stream, int dynamicSMem=0)
 Evil legacy wrapper that allows using bare streams. More...
 

Static Public Member Functions

constexpr static bool needsBoundaryLogic ()
 Overload to turn off the bounds-checked version and reduce overheads, if appropriate. More...
 

Protected Attributes

const int stepsPerBlock
 
const int length
 
const uint16_t remainderBlocks
 

Additional Inherited Members

- Public Types inherited from sp::Kernel< Subclass, KGrid >
using Grid = KGrid
 

Detailed Description

template<typename Subclass, typename KGrid, int blockStride>
class sp::BlockChain1DKernel< Subclass, KGrid, blockStride >

CRTP-base kernel providing software scheduling of virtual blocks, reducing loading overhead.

Typically used to apply an operation to a number of chunks of data- instead of launching a number of blocks linear in the input size, this kernel can launch the best number for the hardware, while providing a very similar interface to the user code:

Subclass should contain a member function template with this signature:

template<bool BoundsCheck = false>
__device__ void processBlock(int blockOffset);

Where blockOffset is an abstract 1D offset to the start of the "region" to be processed by the block, replacing the typical blockIdx, and BoundsCheck is true iff the region of interest does not contain the end of the array. Passed as a template parameter to allow efficient code generation for non-bounds-checked blocks-typically the majority.

Constructor & Destructor Documentation

◆ BlockChain1DKernel()

template<typename Subclass , typename KGrid , int blockStride>
sp::BlockChain1DKernel< Subclass, KGrid, blockStride >::BlockChain1DKernel ( int  stepsPerBlock,
int  length,
uint16_t  remainderBlocks 
)
Parameters
stepsPerBlockHow many chunks each block must consume.
lengthInput array length.
remainderBlocksHow many leftover blocks don't fit into the tiling described by the other parameters.

Member Function Documentation

◆ finish()

template<typename Subclass , typename KGrid , int blockStride>
__device__ void sp::BlockChain1DKernel< Subclass, KGrid, blockStride >::finish ( )

Called after all calls to processBlock(). Overload if you want to do something then.

◆ getStepsThisBlock()

template<typename Subclass , typename KGrid , int blockStride>
__device__ int sp::BlockChain1DKernel< Subclass, KGrid, blockStride >::getStepsThisBlock ( )

Query the number of times this block will run the main loop.

◆ init()

template<typename Subclass , typename KGrid , int blockStride>
__device__ void sp::BlockChain1DKernel< Subclass, KGrid, blockStride >::init ( )

Called before any calls to processBlock(). Overload if you want to do something then.

◆ needsBoundaryLogic()

template<typename Subclass , typename KGrid , int blockStride>
constexpr static bool sp::BlockChain1DKernel< Subclass, KGrid, blockStride >::needsBoundaryLogic ( )
staticconstexpr

Overload to turn off the bounds-checked version and reduce overheads, if appropriate.