CRTP base class for kernels. More...

#include <Kernel.hpp>

Inheritance diagram for sp::Kernel< Subclass, GTraits >:

Public Types
using	Grid = GTraits

Public Member Functions
__device__ void	operator() ()

__device__ bool	isLastBlockHere (__device int *atomicCounter)
	A handy mechanism for determining if a block is the last one to reach a certain point. More...

template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>>
__host__ void	launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, sp::Stream &stream, int dynamicSMem=0)
	Do the kernel launch. More...

template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>>
__host__ void	launch (sp::Vec< int, Grid::GridRank > numBlocks, sp::Vec< int, Grid::BlockRank > numThreads, cudaStream_t stream, int dynamicSMem=0)
	Evil legacy wrapper that allows using bare streams. More...

Detailed Description

template<typename Subclass, typename GTraits>
class sp::Kernel< Subclass, GTraits >

CRTP base class for kernels.

By implementing your kernel as an object that inherits from this one, instead of more conventionally as a function, you can:

Declare shared (and constant) memory objects as members of the kernel class, giving them a lexical scope that matches the lifetime of the kernel.
Pass Kernel-global values (such as kernel parameters) as fields. By setting these once in a constructor, you avoid endlessly passing kernel parameters around through the utility functions that comprise your kernel.

Additionally, it provides:

Slightly faster launches- as this object is the only argument to the generated kernel.
The GridTraits block assumption cache, which lets you specify assumptions about the size and dimensionality of your grid in a way that is accessible to the compiler optimiser. It then can constant-propagate arithmetic involving known constant values (instead of either codegenning math with 1s or 0s, or you jumping through elaborate hoops per-kernel to avoid accessing the unused elements of blockIdx and friends).
See also
sp::Grid
__launch_bounds__ exposed to template metaprogramming.

To use this, simply inherit from this (or one of the other kernel base classes such as BlockChain1DKernel and implement the appropriate function (usually run(), but some subclasses provide a different and higher level interface).

Member Function Documentation

◆ isLastBlockHere()

template<typename Subclass , typename GTraits >

__device__ bool sp::Kernel< Subclass, GTraits >::isLastBlockHere ( __device int * atomicCounter )

A handy mechanism for determining if a block is the last one to reach a certain point.

Note that if you have multiple barriers using this mechanism, each has to use a different counter, or the behaviour is undefined. This mechanism does not synchronise threads between the blocks, it simply keeps track of the race condition so the "last" block to pass a certain point can take some special action.

Parameters

atomicCounter A pointer to an integer in global memory. This must be zero on entry to this function, or the behaviour is undefined.

◆ launch() [1/2]

template<typename Subclass , typename GTraits >

template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>>

__host__ void sp::Kernel< Subclass, GTraits >::launch	(	sp::Vec< int, Grid::GridRank >	numBlocks,
		sp::Vec< int, Grid::BlockRank >	numThreads,
		cudaStream_t	stream,
		int	dynamicSMem = `0`
	)

Evil legacy wrapper that allows using bare streams.

This is required for libraries such as spRand that have to provide the C-like interface, so can't route everything through libSCALE.

Do not use this API if you can use the other one. It will not give you the nice error specialisation that is provided by libSCALE streams.

◆ launch() [2/2]

template<typename Subclass , typename GTraits >

template<typename LaunchBoundsT = sp::LaunchBounds<MAX_THREADS_PER_BLOCK, 1>>

__host__ void sp::Kernel< Subclass, GTraits >::launch	(	sp::Vec< int, Grid::GridRank >	numBlocks,
		sp::Vec< int, Grid::BlockRank >	numThreads,
		sp::Stream &	stream,
		int	dynamicSMem = `0`
	)

Do the kernel launch.

Slightly more convenient than the regular way because it supports Vecs and puts the rarely-used dynamic smem option last, defaulted to zero.

Public Types

Public Member Functions

Detailed Description

Member Function Documentation

◆ isLastBlockHere()

◆ launch() [1/2]

◆ launch() [2/2]