Speclib  0.1.2
The library for writing better CUDA libraries
Reduction

Facilities for reduction. More...

Classes

struct  sp::OrderedSegment< T, I >
 Represents information about a contiguous interval of items in an array, tracking the first item, last item, whether the interval is sorted, and the index of the first element beyond the end of the maximum sorted initial segment. More...
 
struct  sp::IndexConclude
 A Conclude functor which extracts the index from an IndPair. More...
 
struct  sp::ValueConclude
 A Conclude functor which extracts the value from an IndPair. More...
 
struct  sp::IdentityConclude
 A Conclude functor which does absolutely nothing. More...
 

Typedefs

using sp::ArgMinLeftmost = ArgCompare< false, false >
 Functor that determines the smallest element of an array, breaking ties by choosing the leftmost element. More...
 
using sp::ArgMinRightmost = ArgCompare< false, true >
 See documentation for ArgMinLeftmost. More...
 
using sp::ArgMaxLeftmost = ArgCompare< true, false >
 See documentation for ArgMinLeftmost. More...
 
using sp::ArgMaxRightmost = ArgCompare< true, true >
 See documentation for ArgMinLeftmost. More...
 

Functions

template<typename Fn , typename Conclude = sp::IdentityConclude, bool Inplace = false, typename OutTensor , typename InTensor , typename... Args>
void sp::reduce (sp::Stream &s, OutTensor output, InTensor input, Args &&... args)
 Perform a reduction (i.e. More...
 
template<typename Fn , typename T , int VectorSize>
__device__ auto sp::warpValueReduce (sp::Vec< T, VectorSize > contribution, int laneId, int logicalWarpSize=WARP_SIZE)
 Apply a value-reduction across a warp. More...
 
template<typename KGrid , typename Fn , typename T , int VectorSize, int CacheSize>
__device__ auto sp::blockValueReduce (sp::Vec< T, VectorSize > contribution, sp::StaticTensor< T, CacheSize > &cache)
 Reduce the given contributions across the entire block. More...
 

Detailed Description

Facilities for reduction.

Typedef Documentation

◆ ArgMaxLeftmost

using sp::ArgMaxLeftmost = typedef ArgCompare<true, false>

See documentation for ArgMinLeftmost.

◆ ArgMaxRightmost

using sp::ArgMaxRightmost = typedef ArgCompare<true, true>

See documentation for ArgMinLeftmost.

◆ ArgMinLeftmost

using sp::ArgMinLeftmost = typedef ArgCompare<false, false>

Functor that determines the smallest element of an array, breaking ties by choosing the leftmost element.

This is designed to be called using:

sp::reduce<sp::ArgMinLeftmost, sp::IndexConclude>(s, output, input);

to compute the index (argmin) or:

sp::reduce<sp::ArgMinLeftmost, sp::ValueConclude>(s, output, input);

to compute the value (min).

◆ ArgMinRightmost

using sp::ArgMinRightmost = typedef ArgCompare<false, true>

See documentation for ArgMinLeftmost.

Function Documentation

◆ blockValueReduce()

template<typename KGrid , typename Fn , typename T , int VectorSize, int CacheSize>
__device__ auto sp::blockValueReduce ( sp::Vec< T, VectorSize >  contribution,
sp::StaticTensor< T, CacheSize > &  cache 
)

Reduce the given contributions across the entire block.

Template Parameters
Fnbinary functor to reduce with, must satisy associativity and commutativity.
TType of inputs to reduce
VectorSizeNumber of contributed elements per thread
CacheSizeSize of the shared-memory cache used for intermediates
Parameters
contributionElements to reduce
cachesmem buffer for intermediates
Returns
The Fn-wise reduction of inputs, in one thread only, with a flag specifying which one.

◆ reduce()

template<typename Fn , typename Conclude = sp::IdentityConclude, bool Inplace = false, typename OutTensor , typename InTensor , typename... Args>
void sp::reduce ( sp::Stream s,
OutTensor  output,
InTensor  input,
Args &&...  args 
)

Perform a reduction (i.e.

fold) across a one-dimensional array.

Template Parameters
FnA binary functor to be folded across the input. Depending on the choice of Conclude functor, this either accepts values or IndPairs.
ConcludeA unary functor to be applied to the final reduction result. Recommended choices are sp::IdentityConclude (for regular reduction), sp::ValueConclude or sp::IndexConclude.
InplaceWhether to include the current value of the output in the set of reductands.
Parameters
sThe CUDA stream.
outputThe output tensor to which the result will be written. The underlying datatype must match the output of the Conclude functor.
inputA one-dimensional array variable user over which the reduction will be applied.
argsArguments to bind to input. Must be empty if input is not a variable user.

◆ warpValueReduce()

template<typename Fn , typename T , int VectorSize>
__device__ auto sp::warpValueReduce ( sp::Vec< T, VectorSize >  contribution,
int  laneId,
int  logicalWarpSize = WARP_SIZE 
)

Apply a value-reduction across a warp.

Every thread returns the reduced result.

Template Parameters
FnThe binary functor to use for the reduction. Must be associative and commutative, else the behavior is unspecified.
Parameters
contributionA vector of values this thread is contributing to the reduction.
laneIdThe lane ID of this thread.
logicalWarpSizeThe logical warp size to use.