Speclib  0.1.2
The library for writing better CUDA libraries
sp::Kernel< Subclass, GTraits > Class Template Reference

CRTP base class for kernels. More...

#include <Kernel.hpp>

Inheritance diagram for sp::Kernel< Subclass, GTraits >:
[legend]

Public Types

using Grid = GTraits
 

Public Member Functions

__device__ void operator() ()
 
__device__ bool isLastBlockHere (__device int *atomicCounter)
 A handy mechanism for determining if a block is the last one to reach a certain point. More...
 
template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>>
__host__ void launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, sp::Stream &stream, int dynamicSMem=0)
 Do the kernel launch. More...
 
template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>>
__host__ void launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, cudaStream_t stream, int dynamicSMem=0)
 Evil legacy wrapper that allows using bare streams. More...
 

Detailed Description

template<typename Subclass, typename GTraits>
class sp::Kernel< Subclass, GTraits >

CRTP base class for kernels.

By implementing your kernel as an object that inherits from this one, instead of more conventionally as a function, you can:

  • Declare shared (and constant) memory objects as members of the kernel class, giving them a lexical scope that matches the lifetime of the kernel.
  • Pass Kernel-global values (such as kernel parameters) as fields. By setting these once in a constructor, you avoid endlessly passing kernel parameters around through the utility functions that comprise your kernel.

Additionally, it provides:

  • Slightly faster launches- as this object is the only argument to the generated kernel.
  • The GridTraits block assumption cache, which lets you specify assumptions about the size and dimensionality of your grid in a way that is accessible to the compiler optimiser. It then can constant-propagate arithmetic involving known constant values (instead of either codegenning math with 1s or 0s, or you jumping through elaborate hoops per-kernel to avoid accessing the unused elements of blockIdx and friends).
    See also
    sp::Grid
  • __launch_bounds__ exposed to template metaprogramming.

To use this, simply inherit from this (or one of the other kernel base classes such as BlockChain1DKernel and implement the appropriate function (usually run(), but some subclasses provide a different and higher level interface).

Member Function Documentation

◆ isLastBlockHere()

template<typename Subclass , typename GTraits >
__device__ bool sp::Kernel< Subclass, GTraits >::isLastBlockHere ( __device int *  atomicCounter)

A handy mechanism for determining if a block is the last one to reach a certain point.

Note that if you have multiple barriers using this mechanism, each has to use a different counter, or the behaviour is undefined. This mechanism does not synchronise threads between the blocks, it simply keeps track of the race condition so the "last" block to pass a certain point can take some special action.

Parameters
atomicCounterA pointer to an integer in global memory. This must be zero on entry to this function, or the behaviour is undefined.

◆ launch() [1/2]

template<typename Subclass , typename GTraits >
template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>>
__host__ void sp::Kernel< Subclass, GTraits >::launch ( sp::Vec< int, Grid::GridRank >  numBlocks,
sp::Vec< int, Grid::BlockRank >  numThreads,
cudaStream_t  stream,
int  dynamicSMem = 0 
)

Evil legacy wrapper that allows using bare streams.

This is required for libraries such as spRand that have to provide the C-like interface, so can't route everything through libSCALE.

Do not use this API if you can use the other one. It will not give you the nice error specialisation that is provided by libSCALE streams.

◆ launch() [2/2]

template<typename Subclass , typename GTraits >
template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>>
__host__ void sp::Kernel< Subclass, GTraits >::launch ( sp::Vec< int, Grid::GridRank >  numBlocks,
sp::Vec< int, Grid::BlockRank >  numThreads,
sp::Stream stream,
int  dynamicSMem = 0 
)

Do the kernel launch.

Slightly more convenient than the regular way because it supports Vecs and puts the rarely-used dynamic smem option last, defaulted to zero.