Things related to sorting. More...

Classes
struct	sp::DefaultKeyFunctor
	The default KeyFunctor used for radix sorts. More...

Functions
template<typename T , int N>
__device__ sp::Vec< T, N >	sp::warpBitonicSort (sp::Vec< T, N > x, int laneID)
	Perform a bitonic sort across the warp. More...

template<typename T , int ThreadsPerBlock, int N, int EffectiveBlockSize = ThreadsPerBlock>
__device__ sp::Vec< T, N >	sp::blockBitonicSort (sp::Vec< T, N > x, int threadId, sp::StaticTensor< T, ThreadsPerBlock N 2 > &cache)
	Perform a bitonic sort of ThreadsPerBlock * N elements across an entire block. More...

template<typename KeyFunctor = DefaultKeyFunctor, typename SrcTensor , typename TmpTensor >
void	sp::radixSort (sp::Stream &s, SrcTensor subject, TmpTensor tmp)
	Radix sort. More...

Detailed Description

Things related to sorting.

Warp-level sorting
Block-level sorting
The sort kernel

Function Documentation

◆ blockBitonicSort()

template<typename T , int ThreadsPerBlock, int N, int EffectiveBlockSize = ThreadsPerBlock>

__device__ sp::Vec< T, N > sp::blockBitonicSort	(	sp::Vec< T, N >	x,
		int	threadId,
		sp::StaticTensor< T, ThreadsPerBlock N 2 > &	cache
	)

Perform a bitonic sort of ThreadsPerBlock * N elements across an entire block.

The input and output per thread is a vector containing N contiguous elements of the array to be sorted.

This may be used with a proper initial segment of the full thread block, by setting ThreadsPerBlock accordingly. Thread indices are assumed to range over {0, 1, ..., ThreadsPerBlock - 1}, and ThreadsPerBlock must be divisible by WARP_SIZE.

The parameters ThreadsPerBlock and N must both be powers of two.

◆ radixSort()

template<typename KeyFunctor = DefaultKeyFunctor, typename SrcTensor , typename TmpTensor >

void sp::radixSort	(	sp::Stream &	s,
		SrcTensor	subject,
		TmpTensor	tmp
	)

Radix sort.

Performs a radix sort on the elements of subject, using tmp as a temporary workspace. The result is written into subject.

Since radix sort is not a comparison sort, it cannot use operator< to determine the ordering for user defined types. Instead, you must provide a KeyFunctor that describes how to map the input elements to unsigned integers (of any length). You might find __uint<...> useful for representing "strange" integer lengths for this purpose. Shorter keys are faster to sort.

If no KeyFunctor is specified, DefaultKeyFunctor is used. DefaultKeyFunctor knows how to handle all integral and floating point types, meaning if you just want to sort numbers, you can ignore this issue entirely.

An example KeyFunctor for a custom type might look like the following:

struct ThingToSort {
    char c;
    uint32_t i;
}
 
struct ExampleKeyFunctor {
    _hd_ auto operator()(const ThingToSort& value) {
         // Let's sort first by the char, and then by the integer.
         __uint<40> out = __uint<40>{0};
 
         out = sp::bitfieldInsert(
    }
}

Template Parameters

KeyFunctor A functor object that can be used to convert elements of the input TensorLike into integral keys for sorting.

Parameters

s	The stream on which to enqueue the work.
subject	A 1D `TensorLike` to sort.
tmp	A 1D `TensorLike` that must be at least as large as `subject` and is used for temporary storage.

◆ warpBitonicSort()

template<typename T , int N>

__device__ sp::Vec< T, N > sp::warpBitonicSort	(	sp::Vec< T, N >	x,
		int	laneID
	)

Perform a bitonic sort across the warp.

Each thread contributes a Vec of N elements to the sort. Each thread is considered to be holding N adjacent elements of the input list (formed by concatenating the contributions from each thread in order of lane ID). After the sort, each thread gets back the corresponding elements of the sorted list, such that concatenating all of the returned Vecs in laneID order would produce the sorted list.