The base class for the LUT Spec template parameter. More...

#include <LUT.hpp>

Inheritance diagram for sp::LUTSpec:

Classes
struct	ThreeValue
	Three-value logic integer. More...

Static Public Member Functions
template<int Dummy = 0>
static constexpr uint32_t	getMaxInput ()
	Get the value of the largest input in the lookup table. More...

template<int Dummy = 0>
static constexpr ThreeValue	getOutput (uint32_t input)
	Get the output that corresponds to a given input. More...

static constexpr bool	assumeInputInRange ()
	Assume the input is in range. More...

static constexpr bool	allowBitwiseLUT ()
	Allow using a large number of bitwise operations to build a LUT. More...

static constexpr int	maxMemoryLUTBytes ()
	Allow generating a lookup table in memory in flat or constant address space. More...

static constexpr int	memoryLUTPackWidth ()
	The width, in bits, to use for packed memory lookup tables. More...

static constexpr int	getVectorizationWidth ()
	Get the width in bits of the integer-vector type to use for vectorized bitwise operations. More...

Detailed Description

The base class for the LUT Spec template parameter.

This class (and its library and user defined subclasses) serve two functions: 1) They define the lookup table from input to output (

See also: getMaxInput() and; getOutput()). 2) They define how the lookup table should be implemented (see the other methods in this class).

Subclasses of this class should define the functions that it wishes to override. Note that the final int Dummy = 0 template parameter exists only to make the static_assert conditional and is not required in overrides. All methods are intended to be overridden unless otherwise specified.

Member Function Documentation

◆ allowBitwiseLUT()

static constexpr bool sp::LUTSpec::allowBitwiseLUT ( )

staticconstexpr

Allow using a large number of bitwise operations to build a LUT.

This is useful on GPUs to prevent memory access and thread divergence, but on CPU this is probably ony useful if the LUT is large and very sparse.

◆ assumeInputInRange()

static constexpr bool sp::LUTSpec::assumeInputInRange ( )

staticconstexpr

Assume the input is in range.

Returning true produces fewer instructions (and potentially smaller memory usage), but at the expense of completely undefined behaviour. Returning false produces an undefined result when accessing the LUT out of range, but otherwise has defined behaviour.

◆ getMaxInput()

template<int Dummy = 0>

static constexpr uint32_t sp::LUTSpec::getMaxInput ( )

staticconstexpr

Get the value of the largest input in the lookup table.

Lookups with a value larger than this (inputs are unsigned, so no negative numbers are possible) result in undefined behaviour if assumeInputInRange() is true, and an undefined return value otherwise.

◆ getOutput()

template<int Dummy = 0>

static constexpr ThreeValue sp::LUTSpec::getOutput ( uint32_t input )

staticconstexpr

Get the output that corresponds to a given input.

This defines the lookup table that is to be implemented. The returned value is an integer represented with three-value-logic bits.

See also: ThreeValue. This method will be called for each integer between 0 and getMaxInput() inclusive.

◆ getVectorizationWidth()

static constexpr int sp::LUTSpec::getVectorizationWidth ( )

staticconstexpr

Get the width in bits of the integer-vector type to use for vectorized bitwise operations.

Some LUT implementations, such as that enabled by allowBitwiseLUT(), work by converting the lookup table to an equivalent function using bitwise operations. This function specifies how wide those bitwise operations can be.

◆ maxMemoryLUTBytes()

static constexpr int sp::LUTSpec::maxMemoryLUTBytes ( )

staticconstexpr

Allow generating a lookup table in memory in flat or constant address space.

This is useful on CPUs, where memory caching is very good. on GPUs, this gets put in constant memory.

◆ memoryLUTPackWidth()

static constexpr int sp::LUTSpec::memoryLUTPackWidth ( )

staticconstexpr

The width, in bits, to use for packed memory lookup tables.

When a memory lookup table is packed, multiple elements can be stored in the same value loaded from memory. The return value of this method specifies how wide the value loaded from memory should be. If this is zero, then packing is not used. Packing is also not used if it is not possible to fit more than one output in an integer of the returned width.

Memory LUT packing is useful on GPU to reduce the probability that multiple addresses in the constant cache will have to be accessed. It's useful in general to make the lookup table smaller. The downside is that an extra bitwise operations (some of which are dependencies of the load address) have to be performed in order to extract a single value. On CPU, this is probably only worthwhile if the LUT is causing pressure on the L1 data cache.

Classes

Static Public Member Functions

Detailed Description

Member Function Documentation

◆ allowBitwiseLUT()

◆ assumeInputInRange()

◆ getMaxInput()

◆ getOutput()

◆ getVectorizationWidth()

◆ maxMemoryLUTBytes()

◆ memoryLUTPackWidth()