Facilities for reduction.
More...
|
template<typename Fn , typename Conclude = sp::IdentityConclude, bool Inplace = false, typename OutTensor , typename InTensor , typename... Args> |
void | sp::reduce (sp::Stream &s, OutTensor output, InTensor input, Args &&... args) |
| Perform a reduction (i.e. More...
|
|
template<typename Fn , typename T , int VectorSize> |
__device__ auto | sp::warpValueReduce (sp::Vec< T, VectorSize > contribution, int laneId, int logicalWarpSize=WARP_SIZE) |
| Apply a value-reduction across a warp. More...
|
|
template<typename KGrid , typename Fn , typename T , int VectorSize, int CacheSize> |
__device__ auto | sp::blockValueReduce (sp::Vec< T, VectorSize > contribution, sp::StaticTensor< T, CacheSize > &cache) |
| Reduce the given contributions across the entire block. More...
|
|
Facilities for reduction.
- Warp value/index reduction
- Block value/index reduction
- The reduction kernel.
◆ ArgMaxLeftmost
See documentation for ArgMinLeftmost.
◆ ArgMaxRightmost
See documentation for ArgMinLeftmost.
◆ ArgMinLeftmost
Functor that determines the smallest element of an array, breaking ties by choosing the leftmost element.
This is designed to be called using:
sp::reduce<sp::ArgMinLeftmost, sp::IndexConclude>(s, output, input);
to compute the index (argmin) or:
sp::reduce<sp::ArgMinLeftmost, sp::ValueConclude>(s, output, input);
to compute the value (min).
◆ ArgMinRightmost
See documentation for ArgMinLeftmost.
◆ blockValueReduce()
template<typename KGrid , typename Fn , typename T , int VectorSize, int CacheSize>
__device__ auto sp::blockValueReduce |
( |
sp::Vec< T, VectorSize > |
contribution, |
|
|
sp::StaticTensor< T, CacheSize > & |
cache |
|
) |
| |
Reduce the given contributions across the entire block.
- Template Parameters
-
Fn | binary functor to reduce with, must satisy associativity and commutativity. |
T | Type of inputs to reduce |
VectorSize | Number of contributed elements per thread |
CacheSize | Size of the shared-memory cache used for intermediates |
- Parameters
-
contribution | Elements to reduce |
cache | smem buffer for intermediates |
- Returns
- The Fn-wise reduction of inputs, in one thread only, with a flag specifying which one.
◆ reduce()
template<typename Fn , typename Conclude = sp::IdentityConclude, bool Inplace = false, typename OutTensor , typename InTensor , typename... Args>
void sp::reduce |
( |
sp::Stream & |
s, |
|
|
OutTensor |
output, |
|
|
InTensor |
input, |
|
|
Args &&... |
args |
|
) |
| |
Perform a reduction (i.e.
fold) across a one-dimensional array.
- Template Parameters
-
Fn | A binary functor to be folded across the input. Depending on the choice of Conclude functor, this either accepts values or IndPairs. |
Conclude | A unary functor to be applied to the final reduction result. Recommended choices are sp::IdentityConclude (for regular reduction), sp::ValueConclude or sp::IndexConclude . |
Inplace | Whether to include the current value of the output in the set of reductands. |
- Parameters
-
s | The CUDA stream. |
output | The output tensor to which the result will be written. The underlying datatype must match the output of the Conclude functor. |
input | A one-dimensional array variable user over which the reduction will be applied. |
args | Arguments to bind to input. Must be empty if input is not a variable user. |
◆ warpValueReduce()
template<typename Fn , typename T , int VectorSize>
__device__ auto sp::warpValueReduce |
( |
sp::Vec< T, VectorSize > |
contribution, |
|
|
int |
laneId, |
|
|
int |
logicalWarpSize = WARP_SIZE |
|
) |
| |
Apply a value-reduction across a warp.
Every thread returns the reduced result.
- Template Parameters
-
Fn | The binary functor to use for the reduction. Must be associative and commutative, else the behavior is unspecified. |
- Parameters
-
contribution | A vector of values this thread is contributing to the reduction. |
laneId | The lane ID of this thread. |
logicalWarpSize | The logical warp size to use. |