A non-default Specialiser
that flattens N-dimensional packed tensors into 1-dimensional tensors that cover the same memory.
More...
#include <TensorFlatten.hpp>
Static Public Member Functions | |
template<typename T > | |
static constexpr bool | test () |
Is the given type one we know to specialise? More... | |
template<typename... Args> | |
constexpr static bool | canOptimise (sp::TypeList< Args... >) |
Determine if the argument list being considered should be subjected to the optimisation. More... | |
template<typename T , int Rank, typename Opts , typename... Stuff> | |
static auto | run (const Tensor< T, Rank, Opts > &t, Stuff &&... stuff) |
Static Public Member Functions inherited from sp::Specialiser | |
template<typename T > | |
static constexpr bool | test () |
Return true iff the specialiser is able to specialise arguments of type T. More... | |
template<typename... O, typename ArgType , typename... Stuff> | |
static auto | run (const ArgType &thisArg, Stuff &&... stuff) |
Specialise an argument. More... | |
A non-default Specialiser
that flattens N-dimensional packed tensors into 1-dimensional tensors that cover the same memory.
This is useful for reducing address calculations and binary size for operations where this transformation is safe. Anything that relies on higher dimensional addressing for correctness won't survive this optimisation.
This optimisation is currently only supported when it can be applied to all the input Tensor
s being passed. It might be worth considering a variation on this optimisation which flattens things to BroadcastingTensor
s in a way that reduces the total number of kernels that get generated.
This Specialiser
is typically run in a second call to runSpecialised
, after the first one has run TensorSpecialiser
. That is usually achieved by calling runSpecialised
once, but targeting TensorFlatteningTrampoline<T>
. This chaining is necessary to make sure all Tensor
s have been specialised (and the packedness flags set) before this one attempts to check for packedness across all input Tensor
s.
|
staticconstexpr |
Determine if the argument list being considered should be subjected to the optimisation.
Unlike most specialisers, this one requires that a condition be satisfied by the entire argument list (instead of just the "current" argument). This function checks that all tensors in the argument list are packed.
|
staticconstexpr |
Is the given type one we know to specialise?