Speclib  0.1.2
The library for writing better CUDA libraries
sp::LUTSpec Class Reference

The base class for the LUT Spec template parameter. More...

#include <LUT.hpp>

Inheritance diagram for sp::LUTSpec:
[legend]

Classes

struct  ThreeValue
 Three-value logic integer. More...
 

Static Public Member Functions

template<int Dummy = 0>
static constexpr uint32_t getMaxInput ()
 Get the value of the largest input in the lookup table. More...
 
template<int Dummy = 0>
static constexpr ThreeValue getOutput (uint32_t input)
 Get the output that corresponds to a given input. More...
 
static constexpr bool assumeInputInRange ()
 Assume the input is in range. More...
 
static constexpr bool allowBitwiseLUT ()
 Allow using a large number of bitwise operations to build a LUT. More...
 
static constexpr int maxMemoryLUTBytes ()
 Allow generating a lookup table in memory in flat or constant address space. More...
 
static constexpr int memoryLUTPackWidth ()
 The width, in bits, to use for packed memory lookup tables. More...
 
static constexpr int getVectorizationWidth ()
 Get the width in bits of the integer-vector type to use for vectorized bitwise operations. More...
 

Detailed Description

The base class for the LUT Spec template parameter.

This class (and its library and user defined subclasses) serve two functions: 1) They define the lookup table from input to output (

See also
getMaxInput() and
getOutput()). 2) They define how the lookup table should be implemented (see the other methods in this class).

Subclasses of this class should define the functions that it wishes to override. Note that the final int Dummy = 0 template parameter exists only to make the static_assert conditional and is not required in overrides. All methods are intended to be overridden unless otherwise specified.

Member Function Documentation

◆ allowBitwiseLUT()

static constexpr bool sp::LUTSpec::allowBitwiseLUT ( )
staticconstexpr

Allow using a large number of bitwise operations to build a LUT.

This is useful on GPUs to prevent memory access and thread divergence, but on CPU this is probably ony useful if the LUT is large and very sparse.

◆ assumeInputInRange()

static constexpr bool sp::LUTSpec::assumeInputInRange ( )
staticconstexpr

Assume the input is in range.

Returning true produces fewer instructions (and potentially smaller memory usage), but at the expense of completely undefined behaviour. Returning false produces an undefined result when accessing the LUT out of range, but otherwise has defined behaviour.

◆ getMaxInput()

template<int Dummy = 0>
static constexpr uint32_t sp::LUTSpec::getMaxInput ( )
staticconstexpr

Get the value of the largest input in the lookup table.

Lookups with a value larger than this (inputs are unsigned, so no negative numbers are possible) result in undefined behaviour if assumeInputInRange() is true, and an undefined return value otherwise.

◆ getOutput()

template<int Dummy = 0>
static constexpr ThreeValue sp::LUTSpec::getOutput ( uint32_t  input)
staticconstexpr

Get the output that corresponds to a given input.

This defines the lookup table that is to be implemented. The returned value is an integer represented with three-value-logic bits.

See also
ThreeValue. This method will be called for each integer between 0 and getMaxInput() inclusive.

◆ getVectorizationWidth()

static constexpr int sp::LUTSpec::getVectorizationWidth ( )
staticconstexpr

Get the width in bits of the integer-vector type to use for vectorized bitwise operations.

Some LUT implementations, such as that enabled by allowBitwiseLUT(), work by converting the lookup table to an equivalent function using bitwise operations. This function specifies how wide those bitwise operations can be.

◆ maxMemoryLUTBytes()

static constexpr int sp::LUTSpec::maxMemoryLUTBytes ( )
staticconstexpr

Allow generating a lookup table in memory in flat or constant address space.

This is useful on CPUs, where memory caching is very good. on GPUs, this gets put in constant memory.

◆ memoryLUTPackWidth()

static constexpr int sp::LUTSpec::memoryLUTPackWidth ( )
staticconstexpr

The width, in bits, to use for packed memory lookup tables.

When a memory lookup table is packed, multiple elements can be stored in the same value loaded from memory. The return value of this method specifies how wide the value loaded from memory should be. If this is zero, then packing is not used. Packing is also not used if it is not possible to fit more than one output in an integer of the returned width.

Memory LUT packing is useful on GPU to reduce the probability that multiple addresses in the constant cache will have to be accessed. It's useful in general to make the lookup table smaller. The downside is that an extra bitwise operations (some of which are dependencies of the load address) have to be performed in order to extract a single value. On CPU, this is probably only worthwhile if the LUT is causing pressure on the L1 data cache.