The specialisation system. More...
Modules | |
Specialisers | |
Specialisation operators. | |
Classes | |
struct | sp::RunSpecialisedWrapper< Function > |
A functor to wrap the call to Function::run() . More... | |
class | sp::Specialiser |
The interface for a Specialiser . More... | |
Typedefs | |
using | sp::DefaultSpecialisers = sp::TypeList< VariableBindingSpecialiser, ScalarSpecialiser, TensorSpecialiser, BroadcastingTensorSpecialiser, OutputExprSpecialiser, TensorExprSpecialiser, VariantScalarSpecialiser, VariantOutputPtrSpecialiser, PtrScalarSpecialiser<>, TensorDescriptorSpecialiser > |
The default list of Specialiser types to use. AKA: all the ones that are valid in all cases. More... | |
Functions | |
template<typename... OutArgs, typename Functor , typename... Specialisers, typename InArg , typename... InArgs> | |
auto | sp::specialiseNextArgument (Functor &fn, OutArgs &&... outArgs, sp::TypeList< Specialisers... > specialisers, InArg &&inArg, InArgs &&... args) |
template<typename OutArgs , typename NewArg , typename Functor , typename... Specialisers, typename... Args> | |
auto | sp::specialiseMore (NewArg &&newArg, Functor &fn, OutArgs &&outArgs, sp::TypeList< Specialisers... > specialisers, Args &&... args) |
Called when the next argument has been finished with by the Specialiser s. More... | |
template<typename InArg , typename... Specialisers> | |
constexpr int | sp::findAcceptingSpecialiser () |
template<typename OutArgs , typename Functor , typename... Specialisers> | |
auto | sp::specialiseNextArgument (Functor &fn, OutArgs outArgs, sp::TypeList< Specialisers... >) |
template<typename OutArgs , typename Functor , typename... Specialisers, typename InArg , typename... InArgs> | |
auto | sp::specialiseNextArgument (Functor &fn, OutArgs outArgs, sp::TypeList< Specialisers... > specialisers, InArg &&inArg, InArgs &&... args) |
template<typename Functor , typename... Specialisers, typename... Args> | |
auto | sp::startSpecialisation (Functor &fn, sp::TypeList< Specialisers... > specialisers, Args &&... args) |
template<typename Functor , typename... Args> | |
auto | sp::runSpecialised (Functor &fn, Args &&... args) |
template<typename Functor , typename... Args> | |
auto | sp::runSpecialisedLambda (Functor fn, Args &&... args) |
Do a specialised call to a lambda. More... | |
template<typename Function , typename... Specs, typename... Args> | |
auto | sp::runWithSpecialisers (sp::TypeList< Specs... > specialisers, Args &&... args) |
Perform a specialising call to static function Function::run() , with the given specialisers. More... | |
template<typename Function , typename... Args> | |
auto | sp::runSpecialised (Args &&... args) |
Perform a specialising call to static function Function::run() , with the default specialsiers. More... | |
The specialisation system.
CUDA kernels frequently come in multiple versions, specialised for different special cases (a constant is zero? The input is aligned? The input is small? etc.). It's common for the CPU to check these optimisable conditions itself and pick an appropriate CUDA kernel for the situation: a small amount of CPU-side branching can save a lot of GPU time.
C++ templates allow us perform that kind of specialisation, but the resulting code gets ugly fast:
The above example - although a simple case - shows how quickly this gets annoying:
Speclib provides a mechanism to take care of the details, allowing the above to be rephrased as:
The above example will optimise for the case where a == 0
, too, and we can add arbitary "values of interest" by adding them to the ScalarsToSpecialise
integer sequence.
sp::runSpecialisedKernel
generates a compile-time branch tree to select the optimal combination of arguments to pass to the target kernel function template.
This generates exponentially-many specializations in the number of arguments, as it has to enumerate all combinations, so specialisation targets can provide compile-time functions that tell the specialiser ones to consider, avoiding expensive generation of uninteresting or unused cases.
You can also define new transformations for the specialiser to perform, run any subset of the existing specialisers on a per-call basis. The built-in set of specialisers mostly relate to optimising Tensor
flags for packedness, alignment, __restrict__
-ness, etc.
using sp::DefaultSpecialisers = typedef sp::TypeList< VariableBindingSpecialiser, ScalarSpecialiser, TensorSpecialiser, BroadcastingTensorSpecialiser, OutputExprSpecialiser, TensorExprSpecialiser, VariantScalarSpecialiser, VariantOutputPtrSpecialiser, PtrScalarSpecialiser<>, TensorDescriptorSpecialiser > |
The default list of Specialiser
types to use. AKA: all the ones that are valid in all cases.
auto sp::runSpecialised | ( | Args &&... | args | ) |
Perform a specialising call to static function Function::run()
, with the default specialsiers.
Function | The type to call run() on. |
args | Forwarding references to the arguments to pass to Function::run(). |
auto sp::runSpecialisedLambda | ( | Functor | fn, |
Args &&... | args | ||
) |
Do a specialised call to a lambda.
auto sp::runWithSpecialisers | ( | sp::TypeList< Specs... > | specialisers, |
Args &&... | args | ||
) |
Perform a specialising call to static function Function::run()
, with the given specialisers.
auto sp::specialiseMore | ( | NewArg && | newArg, |
Functor & | fn, | ||
OutArgs && | outArgs, | ||
sp::TypeList< Specialisers... > | specialisers, | ||
Args &&... | args | ||
) |
Called when the next argument has been finished with by the Specialiser
s.
The continuation function to be called by a specialiser when it's finished specialising.
This makes it easier to write Specialiser
s, because they don't have to keep track of the two argument packs being assembled by specialiseNextArgument
, but it means we get passed the new argument at the start, followed by a blind forwarding of all the other stuff we passed into the Specialiser
(which is just the arguments that specialiseNextArgument
had at that point).
Note that an argument was removed from args
when the Specialiser
was called, so we need not do it now.
newArg | The argument, after specialisation. Append this to outArgs and call specialiseNextArgument again. |
outArgs | The same outArgs that specialiseNextArgument received right before it called the Specialiser that just called this function. |
fn | The target functor |
specialisers | The Specialiser s being applied |
args | The not-yet-specialised arguments. |
Most of the arguments are just forwarded blindly by the specialiser, with its output (the newly-specialised argument) passed as the first.