Speclib  0.1.2
The library for writing better CUDA libraries
sp::TensorFlattenSpecialiser Struct Reference

A non-default Specialiser that flattens N-dimensional packed tensors into 1-dimensional tensors that cover the same memory. More...

#include <TensorFlatten.hpp>

Inheritance diagram for sp::TensorFlattenSpecialiser:
[legend]

Static Public Member Functions

template<typename T >
static constexpr bool test ()
 Is the given type one we know to specialise? More...
 
template<typename... Args>
constexpr static bool canOptimise (sp::TypeList< Args... >)
 Determine if the argument list being considered should be subjected to the optimisation. More...
 
template<typename T , int Rank, typename Opts , typename... Stuff>
static auto run (const Tensor< T, Rank, Opts > &t, Stuff &&... stuff)
 
- Static Public Member Functions inherited from sp::Specialiser
template<typename T >
static constexpr bool test ()
 Return true iff the specialiser is able to specialise arguments of type T. More...
 
template<typename... O, typename ArgType , typename... Stuff>
static auto run (const ArgType &thisArg, Stuff &&... stuff)
 Specialise an argument. More...
 

Detailed Description

A non-default Specialiser that flattens N-dimensional packed tensors into 1-dimensional tensors that cover the same memory.

This is useful for reducing address calculations and binary size for operations where this transformation is safe. Anything that relies on higher dimensional addressing for correctness won't survive this optimisation.

See also
TensorFlatteningTrampoline

This optimisation is currently only supported when it can be applied to all the input Tensors being passed. It might be worth considering a variation on this optimisation which flattens things to BroadcastingTensors in a way that reduces the total number of kernels that get generated.

This Specialiser is typically run in a second call to runSpecialised, after the first one has run TensorSpecialiser. That is usually achieved by calling runSpecialised once, but targeting TensorFlatteningTrampoline<T>. This chaining is necessary to make sure all Tensors have been specialised (and the packedness flags set) before this one attempts to check for packedness across all input Tensors.

Member Function Documentation

◆ canOptimise()

template<typename... Args>
constexpr static bool sp::TensorFlattenSpecialiser::canOptimise ( sp::TypeList< Args... >  )
staticconstexpr

Determine if the argument list being considered should be subjected to the optimisation.

Unlike most specialisers, this one requires that a condition be satisfied by the entire argument list (instead of just the "current" argument). This function checks that all tensors in the argument list are packed.

◆ test()

template<typename T >
static constexpr bool sp::TensorFlattenSpecialiser::test ( )
staticconstexpr

Is the given type one we know to specialise?