Facilities for reduction. More...

Classes
struct	sp::OrderedSegment< T, I >
	Represents information about a contiguous interval of items in an array, tracking the first item, last item, whether the interval is sorted, and the index of the first element beyond the end of the maximum sorted initial segment. More...

struct	sp::IndexConclude
	A Conclude functor which extracts the index from an IndPair. More...

struct	sp::ValueConclude
	A Conclude functor which extracts the value from an IndPair. More...

struct	sp::IdentityConclude
	A Conclude functor which does absolutely nothing. More...

Typedefs
using	sp::ArgMinLeftmost = ArgCompare< false, false >
	Functor that determines the smallest element of an array, breaking ties by choosing the leftmost element. More...

using	sp::ArgMinRightmost = ArgCompare< false, true >
	See documentation for ArgMinLeftmost. More...

using	sp::ArgMaxLeftmost = ArgCompare< true, false >
	See documentation for ArgMinLeftmost. More...

using	sp::ArgMaxRightmost = ArgCompare< true, true >
	See documentation for ArgMinLeftmost. More...

Functions
template<typename Fn , typename Conclude = sp::IdentityConclude, bool Inplace = false, typename OutTensor , typename InTensor , typename... Args>
void	sp::reduce (sp::Stream &s, OutTensor output, InTensor input, Args &&... args)
	Perform a reduction (i.e. More...

template<typename Fn , typename T , int VectorSize>
__device__ auto	sp::warpValueReduce (sp::Vec< T, VectorSize > contribution, int laneId, int logicalWarpSize=WARP_SIZE)
	Apply a value-reduction across a warp. More...

template<typename KGrid , typename Fn , typename T , int VectorSize, int CacheSize>
__device__ auto	sp::blockValueReduce (sp::Vec< T, VectorSize > contribution, sp::StaticTensor< T, CacheSize > &cache)
	Reduce the given contributions across the entire block. More...

Detailed Description

Facilities for reduction.

Warp value/index reduction
Block value/index reduction
The reduction kernel.

Typedef Documentation

◆ ArgMaxLeftmost

using sp::ArgMaxLeftmost = typedef ArgCompare<true, false>

See documentation for ArgMinLeftmost.

◆ ArgMaxRightmost

using sp::ArgMaxRightmost = typedef ArgCompare<true, true>

See documentation for ArgMinLeftmost.

◆ ArgMinLeftmost

using sp::ArgMinLeftmost = typedef ArgCompare<false, false>

Functor that determines the smallest element of an array, breaking ties by choosing the leftmost element.

This is designed to be called using:

sp::reduce<sp::ArgMinLeftmost, sp::IndexConclude>(s, output, input);

to compute the index (argmin) or:

sp::reduce<sp::ArgMinLeftmost, sp::ValueConclude>(s, output, input);

to compute the value (min).

◆ ArgMinRightmost

using sp::ArgMinRightmost = typedef ArgCompare<false, true>

See documentation for ArgMinLeftmost.

Function Documentation

◆ blockValueReduce()

template<typename KGrid , typename Fn , typename T , int VectorSize, int CacheSize>

__device__ auto sp::blockValueReduce	(	sp::Vec< T, VectorSize >	contribution,
		sp::StaticTensor< T, CacheSize > &	cache
	)

Reduce the given contributions across the entire block.

Template Parameters

Fn	binary functor to reduce with, must satisy associativity and commutativity.
T	Type of inputs to reduce
VectorSize	Number of contributed elements per thread
CacheSize	Size of the shared-memory cache used for intermediates

Parameters

contribution	Elements to reduce
cache	smem buffer for intermediates

Returns: The Fn-wise reduction of inputs, in one thread only, with a flag specifying which one.

◆ reduce()

template<typename Fn , typename Conclude = sp::IdentityConclude, bool Inplace = false, typename OutTensor , typename InTensor , typename... Args>

void sp::reduce	(	sp::Stream &	s,
		OutTensor	output,
		InTensor	input,
		Args &&...	args
	)

Perform a reduction (i.e.

fold) across a one-dimensional array.

Template Parameters

Fn	A binary functor to be folded across the input. Depending on the choice of Conclude functor, this either accepts values or IndPairs.
Conclude	A unary functor to be applied to the final reduction result. Recommended choices are `sp::IdentityConclude` (for regular reduction), `sp::ValueConclude` or `sp::IndexConclude`.
Inplace	Whether to include the current value of the output in the set of reductands.

Parameters

s	The CUDA stream.
output	The output tensor to which the result will be written. The underlying datatype must match the output of the Conclude functor.
input	A one-dimensional array variable user over which the reduction will be applied.
args	Arguments to bind to input. Must be empty if input is not a variable user.

◆ warpValueReduce()

template<typename Fn , typename T , int VectorSize>

__device__ auto sp::warpValueReduce	(	sp::Vec< T, VectorSize >	contribution,
		int	laneId,
		int	logicalWarpSize = `WARP_SIZE`
	)

Apply a value-reduction across a warp.

Every thread returns the reduced result.

Template Parameters

Fn	The binary functor to use for the reduction. Must be associative and commutative, else the behavior is unspecified.

Parameters

contribution	A vector of values this thread is contributing to the reduction.
laneId	The lane ID of this thread.
logicalWarpSize	The logical warp size to use.

Classes

Typedefs

Functions

Detailed Description

Typedef Documentation

◆ ArgMaxLeftmost

◆ ArgMaxRightmost

◆ ArgMinLeftmost

◆ ArgMinRightmost

Function Documentation

◆ blockValueReduce()

◆ reduce()

◆ warpValueReduce()