Speclib  0.1.2
The library for writing better CUDA libraries
Floating Point Math with Rounding Modes

These functions provide convenient access to the floating point instructions with special rounding modes. More...

Enumerations

enum  sp::RoundingMode { sp::NEAREST_EVEN , sp::ZERO , sp::NEG_INF , sp::POS_INF }
 Identifies a rounding mode. More...
 

Functions

template<RoundingMode RM, bool FTZ = true>
__device__ float sp::add (float x, float y)
 FP add with configurable rounding mode and denormal handling. More...
 
template<RoundingMode RM, bool FTZ = true>
__device__ double sp::add (double x, double y)
 FP add with configurable rounding mode and denormal handling. More...
 
template<RoundingMode RM, bool FTZ = true, typename T >
__device__ T sp::sub (T x, T y)
 FP subtract with configurable rounding mode and denormal handling. More...
 
template<RoundingMode RM, bool FTZ = true>
__device__ float sp::mul (float x, float y)
 FP multiply with configurable rounding mode and denorm handling. More...
 
template<bool FTZ = true>
__device__ float sp::divApprox (float x, float y)
 FP divide, computed as d = a * (1/b). More...
 
template<RoundingMode RM, bool FTZ = true>
__device__ float sp::fma (float x, float y, float z)
 Full-range approximate FP divide. More...
 
template<RoundingMode RM, bool FTZ = true>
__device__ double sp::fma (double x, double y, double z)
 FP multiply-add with configurable rounding mode and denorm handling. More...
 
template<RoundingMode RM, bool FTZ = true>
__device__ double sp::mul (double x, double y)
 FP multiply with configurable rounding mode and denorm handling. More...
 
template<RoundingMode RM, bool FTZ = true>
__device__ float sp::approxRcp (float x)
 FP approximate recirprocal with configurable rounding mode and denorm handling. More...
 
template<bool FTZ = true>
__device__ float sp::approxSqrt (float x)
 
template<bool FTZ = true>
__device__ float sp::approxRSqrt (float x)
 
template<bool FTZ = true>
__device__ float sp::approxExp2 (float x)
 Compute a fast approximation to exp2(). More...
 
template<bool FTZ = true>
__device__ float sp::approxLog2 (float x)
 Compute a fast approximation to Log2(). More...
 
template<bool FTZ = true>
__device__ float sp::approxSin (float x)
 Compute a fast approximation to sin(). More...
 
template<bool FTZ = true>
__device__ float sp::approxCos (float x)
 Compute a fast approximation to cos(). More...
 

Variables

constexpr bool sp::FTZ = true
 Allows you to write eg. sp::approxSqrt<FTZ>(x); More...
 
constexpr bool sp::NoFTZ = false
 

Detailed Description

These functions provide convenient access to the floating point instructions with special rounding modes.

While these operations are somewhat more clunky to use than the regular FP operators, they can be occasionally helpful when writing particularly accuracy-sensitive code.

Typically, there is no performance penalty for using these operators over their ordinary counterparts. You just get a different value.

Enumeration Type Documentation

◆ RoundingMode

Identifies a rounding mode.

Enumerator
NEAREST_EVEN 

Round to nearest even (default)

ZERO 

Round towards zero.

NEG_INF 

Round to negative infinity.

POS_INF 

Round towards positive infinity.

Function Documentation

◆ add() [1/2]

template<RoundingMode RM, bool FTZ = true>
__device__ double sp::add ( double  x,
double  y 
)

FP add with configurable rounding mode and denormal handling.

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to zero.

◆ add() [2/2]

template<RoundingMode RM, bool FTZ = true>
__device__ float sp::add ( float  x,
float  y 
)

FP add with configurable rounding mode and denormal handling.

TODO: Saturation is supported by this instruction.

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to zero.

◆ approxCos()

template<bool FTZ = true>
__device__ float sp::approxCos ( float  x)

Compute a fast approximation to cos().

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to sign-preserving zero.

◆ approxExp2()

template<bool FTZ = true>
__device__ float sp::approxExp2 ( float  x)

Compute a fast approximation to exp2().

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to sign-preserving zero.

◆ approxLog2()

template<bool FTZ = true>
__device__ float sp::approxLog2 ( float  x)

Compute a fast approximation to Log2().

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to sign-preserving zero.

◆ approxRcp()

template<RoundingMode RM, bool FTZ = true>
__device__ float sp::approxRcp ( float  x)

FP approximate recirprocal with configurable rounding mode and denorm handling.

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to zero.

◆ approxSin()

template<bool FTZ = true>
__device__ float sp::approxSin ( float  x)

Compute a fast approximation to sin().

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to sign-preserving zero.

◆ divApprox()

template<bool FTZ = true>
__device__ float sp::divApprox ( float  x,
float  y 
)

FP divide, computed as d = a * (1/b).

For divisors between $$-2^{-126}$$ and $$-2^{126}$$, the maximum ULP error is 2.

◆ fma() [1/2]

template<RoundingMode RM, bool FTZ = true>
__device__ double sp::fma ( double  x,
double  y,
double  z 
)

FP multiply-add with configurable rounding mode and denorm handling.

TODO: Saturation is supported by this instruction.

Computes (x * y) + z with no intermediate rounding.

Template Parameters
RMRounding mode to use for the final result.
FTZIf true, denormal inputs may be flushed to zero.

◆ fma() [2/2]

template<RoundingMode RM, bool FTZ = true>
__device__ float sp::fma ( float  x,
float  y,
float  z 
)

Full-range approximate FP divide.

A fast, full-range approximation to divide. FP multiply-add with configurable rounding mode and denorm handling.

TODO: Saturation is supported by this instruction.

Computes (x * y) + z with no intermediate rounding.

Template Parameters
RMRounding mode to use for the final result.
FTZIf true, denormal inputs may be flushed to zero.

◆ mul() [1/2]

template<RoundingMode RM, bool FTZ = true>
__device__ double sp::mul ( double  x,
double  y 
)

FP multiply with configurable rounding mode and denorm handling.

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to zero.

◆ mul() [2/2]

template<RoundingMode RM, bool FTZ = true>
__device__ float sp::mul ( float  x,
float  y 
)

FP multiply with configurable rounding mode and denorm handling.

TODO: Saturation is supported by this instruction.

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to zero.

◆ sub()

template<RoundingMode RM, bool FTZ = true, typename T >
__device__ T sp::sub ( x,
y 
)

FP subtract with configurable rounding mode and denormal handling.

TODO: Saturation is supported by this instruction.

Template Parameters
RMRounding mode to use.
FTZIf true, denormal inputs may be flushed to zero.

Variable Documentation

◆ FTZ

constexpr bool sp::FTZ = true
constexpr

Allows you to write eg. sp::approxSqrt<FTZ>(x);