Namespace with Spectral Compute Ltd things. More...
Classes | |
struct | add_addrspace |
Add an address space qualifier to a type. More... | |
struct | allocatable_type |
The type that you need to allocate for T. Always T, except for void , which yields char . More... | |
class | BlockingEvent |
An event that the host can synchronise with. More... | |
struct | copy_cvref |
T, with the cvref qualifiers of Q. More... | |
class | CudaAddressOfConstantException |
Exception corresponding to cudaErrorAddressOfConstant of cudaError . More... | |
class | CudaAlreadyAcquiredException |
Exception corresponding to cudaErrorAlreadyAcquired of cudaError . More... | |
class | CudaAlreadyMappedException |
Exception corresponding to cudaErrorAlreadyMapped of cudaError . More... | |
class | CudaArrayIsMappedException |
Exception corresponding to cudaErrorArrayIsMapped of cudaError . More... | |
class | CudaAssertException |
Exception corresponding to cudaErrorAssert of cudaError . More... | |
class | CudaCapturedEventException |
Exception corresponding to cudaErrorCapturedEvent of cudaError . More... | |
class | CudaCompatNotSupportedOnDeviceException |
Exception corresponding to cudaErrorCompatNotSupportedOnDevice of cudaError . More... | |
class | CudaContextIsDestroyedException |
Exception corresponding to cudaErrorContextIsDestroyed of cudaError . More... | |
class | CudaCooperativeLaunchTooLargeException |
Exception corresponding to cudaErrorCooperativeLaunchTooLarge of cudaError . More... | |
class | CudaDeviceAlreadyInUseException |
Exception corresponding to cudaErrorDeviceAlreadyInUse of cudaError . More... | |
class | CudaDevicesUnavailableException |
Exception corresponding to cudaErrorDevicesUnavailable of cudaError . More... | |
class | CudaDeviceUninitializedException |
Exception corresponding to cudaErrorDeviceUninitialized of cudaError . More... | |
class | CudaDuplicateSurfaceNameException |
Exception corresponding to cudaErrorDuplicateSurfaceName of cudaError . More... | |
class | CudaDuplicateTextureNameException |
Exception corresponding to cudaErrorDuplicateTextureName of cudaError . More... | |
class | CudaDuplicateVariableNameException |
Exception corresponding to cudaErrorDuplicateVariableName of cudaError . More... | |
class | CudaECCUncorrectableException |
Exception corresponding to cudaErrorECCUncorrectable of cudaError . More... | |
class | CudaErrorCallRequiresNewerDriverException |
Exception corresponding to cudaErrorCallRequiresNewerDriver of cudaError . More... | |
class | CudaErrorDeviceNotLicensedException |
Exception corresponding to cudaErrorDeviceNotLicensed of cudaError . More... | |
class | CudaErrorExternalDeviceException |
Exception corresponding to cudaErrorExternalDevice of cudaError . More... | |
class | CudaErrorJitCompilationDisabledException |
Exception corresponding to cudaErrorJitCompilationDisabled of cudaError . More... | |
class | CudaErrorMpsConnectionFailedException |
Exception corresponding to cudaErrorMpsConnectionFailed of cudaError . More... | |
class | CudaErrorMpsMaxClientsReachedException |
Exception corresponding to cudaErrorMpsMaxClientsReached of cudaError . More... | |
class | CudaErrorMpsMaxConnectionsReachedException |
Exception corresponding to cudaErrorMpsMaxConnectionsReached of cudaError . More... | |
class | CudaErrorMpsRpcFailureException |
Exception corresponding to cudaErrorMpsRpcFailure of cudaError . More... | |
class | CudaErrorMpsServerNotReadyException |
Exception corresponding to cudaErrorMpsServerNotReady of cudaError . More... | |
class | CudaErrorSoftwareValidityNotEstablishedException |
Exception corresponding to cudaErrorSoftwareValidityNotEstablished of cudaError . More... | |
class | CudaErrorStubLibraryException |
Exception corresponding to cudaErrorStubLibrary of cudaError . More... | |
class | CudaErrorUnsupportedExecAffinityException |
Exception corresponding to cudaErrorUnsupportedExecAffinity of cudaError . More... | |
class | CudaErrorUnsupportedPtxVersionException |
Exception corresponding to cudaErrorUnsupportedPtxVersion of cudaError . More... | |
class | CudaException |
Base class for exception types that wrap CUDA error codes. More... | |
class | CudaFileNotFoundException |
Exception corresponding to cudaErrorFileNotFound of cudaError . More... | |
class | CudaGraphExecUpdateFailureException |
Exception corresponding to cudaErrorGraphExecUpdateFailure of cudaError . More... | |
class | CudaHardwareStackErrorException |
Exception corresponding to cudaErrorHardwareStackError of cudaError . More... | |
class | CudaHostMemoryAlreadyRegisteredException |
Exception corresponding to cudaErrorHostMemoryAlreadyRegistered of cudaError . More... | |
class | CudaHostMemoryNotRegisteredException |
Exception corresponding to cudaErrorHostMemoryNotRegistered of cudaError . More... | |
class | CudaIllegalAddressException |
Exception corresponding to cudaErrorIllegalAddress of cudaError . More... | |
class | CudaIllegalStateException |
Exception corresponding to cudaErrorIllegalState of cudaError . More... | |
class | CudaIncompatibleDriverContextException |
Exception corresponding to cudaErrorIncompatibleDriverContext of cudaError . More... | |
class | CudaInitializationErrorException |
Exception corresponding to cudaErrorInitializationError of cudaError . More... | |
class | CudaInstructionFailureException |
Exception corresponding to cudaErrorIllegalInstruction of cudaError . More... | |
class | CudaInsufficientDriverException |
Exception corresponding to cudaErrorInsufficientDriver of cudaError . More... | |
class | CudaInvalidAddressSpaceException |
Exception corresponding to cudaErrorInvalidAddressSpace of cudaError . More... | |
class | CudaInvalidChannelDescriptorException |
Exception corresponding to cudaErrorInvalidChannelDescriptor of cudaError . More... | |
class | CudaInvalidConfigurationException |
Exception corresponding to cudaErrorInvalidConfiguration of cudaError . More... | |
class | CudaInvalidDeviceException |
Exception corresponding to cudaErrorInvalidDevice of cudaError . More... | |
class | CudaInvalidDeviceFunctionException |
Exception corresponding to cudaErrorInvalidDeviceFunction of cudaError . More... | |
class | CudaInvalidDevicePointerException |
Exception corresponding to cudaErrorInvalidDevicePointer of cudaError . More... | |
class | CudaInvalidFilterSettingException |
Exception corresponding to cudaErrorInvalidFilterSetting of cudaError . More... | |
class | CudaInvalidGraphicsContextException |
Exception corresponding to cudaErrorInvalidGraphicsContext of cudaError . More... | |
class | CudaInvalidHostPointerException |
Exception corresponding to cudaErrorInvalidHostPointer of cudaError . More... | |
class | CudaInvalidKernelImageException |
Exception corresponding to cudaErrorInvalidKernelImage of cudaError . More... | |
class | CudaInvalidMemcpyDirectionException |
Exception corresponding to cudaErrorInvalidMemcpyDirection of cudaError . More... | |
class | CudaInvalidNormSettingException |
Exception corresponding to cudaErrorInvalidNormSetting of cudaError . More... | |
class | CudaInvalidPcException |
Exception corresponding to cudaErrorInvalidPc of cudaError . More... | |
class | CudaInvalidPitchValueException |
Exception corresponding to cudaErrorInvalidPitchValue of cudaError . More... | |
class | CudaInvalidPtxException |
Exception corresponding to cudaErrorInvalidPtx of cudaError . More... | |
class | CudaInvalidResourceHandleException |
Exception corresponding to cudaErrorInvalidResourceHandle of cudaError . More... | |
class | CudaInvalidSourceException |
Exception corresponding to cudaErrorInvalidSource of cudaError . More... | |
class | CudaInvalidSurfaceException |
Exception corresponding to cudaErrorInvalidSurface of cudaError . More... | |
class | CudaInvalidSymbolException |
Exception corresponding to cudaErrorInvalidSymbol of cudaError . More... | |
class | CudaInvalidTextureBindingException |
Exception corresponding to cudaErrorInvalidTextureBinding of cudaError . More... | |
class | CudaInvalidTextureException |
Exception corresponding to cudaErrorInvalidTexture of cudaError . More... | |
class | CudaInvalidValueException |
Exception corresponding to cudaErrorInvalidValue of cudaError . More... | |
class | CudaJitCompilerNotFoundException |
Exception corresponding to cudaErrorJitCompilerNotFound of cudaError . More... | |
class | CudaKernel |
Object representing a kernel. More... | |
class | CudaLaunchFailureException |
Exception corresponding to cudaErrorLaunchFailure of cudaError . More... | |
class | CudaLaunchFileScopedSurfException |
Exception corresponding to cudaErrorLaunchFileScopedSurf of cudaError . More... | |
class | CudaLaunchFileScopedTexException |
Exception corresponding to cudaErrorLaunchFileScopedTex of cudaError . More... | |
class | CudaLaunchIncompatibleTexturingException |
Exception corresponding to cudaErrorLaunchIncompatibleTexturing of cudaError . More... | |
class | CudaLaunchMaxDepthExceededException |
Exception corresponding to cudaErrorLaunchMaxDepthExceeded of cudaError . More... | |
class | CudaLaunchOutOfResourcesException |
Exception corresponding to cudaErrorLaunchOutOfResources of cudaError . More... | |
class | CudaLaunchPendingCountExceededException |
Exception corresponding to cudaErrorLaunchPendingCountExceeded of cudaError . More... | |
class | CudaLaunchTimeoutException |
Exception corresponding to cudaErrorLaunchTimeout of cudaError . More... | |
class | CudaMapBufferObjectFailedException |
Exception corresponding to cudaErrorMapBufferObjectFailed of cudaError . More... | |
class | CudaMemoryAllocationException |
Exception corresponding to cudaErrorMemoryAllocation of cudaError . More... | |
class | CudaMemoryValueTooLargeException |
Exception corresponding to cudaErrorMemoryValueTooLarge of cudaError . More... | |
class | CudaMisalignedAddressException |
Exception corresponding to cudaErrorMisalignedAddress of cudaError . More... | |
class | CudaMissingConfigurationException |
Exception corresponding to cudaErrorMissingConfiguration of cudaError . More... | |
class | CudaMixedDeviceExecutionException |
Exception corresponding to cudaErrorMixedDeviceExecution of cudaError . More... | |
class | CudaNoDeviceException |
Exception corresponding to cudaErrorNoDevice of cudaError . More... | |
class | CudaNoKernelImageForDeviceException |
Exception corresponding to cudaErrorNoKernelImageForDevice of cudaError . More... | |
class | CudaNotMappedAsArrayException |
Exception corresponding to cudaErrorNotMappedAsArray of cudaError . More... | |
class | CudaNotMappedAsPointerException |
Exception corresponding to cudaErrorNotMappedAsPointer of cudaError . More... | |
class | CudaNotMappedException |
Exception corresponding to cudaErrorNotMapped of cudaError . More... | |
class | CudaNotPermittedException |
Exception corresponding to cudaErrorNotPermitted of cudaError . More... | |
class | CudaNotReadyException |
Exception corresponding to cudaErrorNotReady of cudaError . More... | |
class | CudaNotSupportedException |
Exception corresponding to cudaErrorNotSupported of cudaError . More... | |
class | CudaNotYetImplementedException |
Exception corresponding to cudaErrorNotYetImplemented of cudaError . More... | |
class | CudaNvlinkUncorrectableException |
Exception corresponding to cudaErrorNvlinkUncorrectable of cudaError . More... | |
class | CudaOperatingSystemException |
Exception corresponding to cudaErrorOperatingSystem of cudaError . More... | |
class | CudaPeerAccessAlreadyEnabledException |
Exception corresponding to cudaErrorPeerAccessAlreadyEnabled of cudaError . More... | |
class | CudaPeerAccessNotEnabledException |
Exception corresponding to cudaErrorPeerAccessNotEnabled of cudaError . More... | |
class | CudaPeerAccessUnsupportedException |
Exception corresponding to cudaErrorPeerAccessUnsupported of cudaError . More... | |
class | CudaPriorLaunchFailureException |
Exception corresponding to cudaErrorPriorLaunchFailure of cudaError . More... | |
class | CudaProfilerAlreadyStartedException |
Exception corresponding to cudaErrorProfilerAlreadyStarted of cudaError . More... | |
class | CudaProfilerAlreadyStoppedException |
Exception corresponding to cudaErrorProfilerAlreadyStopped of cudaError . More... | |
class | CudaProfilerDisabledException |
Exception corresponding to cudaErrorProfilerDisabled of cudaError . More... | |
class | CudaProfilerNotInitializedException |
Exception corresponding to cudaErrorProfilerNotInitialized of cudaError . More... | |
class | CudaSetOnActiveProcessException |
Exception corresponding to cudaErrorSetOnActiveProcess of cudaError . More... | |
class | CudaSharedObjectInitFailedException |
Exception corresponding to cudaErrorSharedObjectInitFailed of cudaError . More... | |
class | CudaSharedObjectSymbolNotFoundException |
Exception corresponding to cudaErrorSharedObjectSymbolNotFound of cudaError . More... | |
class | CudaStartupFailureException |
Exception corresponding to cudaErrorStartupFailure of cudaError . More... | |
class | CudaStreamCaptureImplicitException |
Exception corresponding to cudaErrorStreamCaptureImplicit of cudaError . More... | |
class | CudaStreamCaptureInvalidatedException |
Exception corresponding to cudaErrorStreamCaptureInvalidated of cudaError . More... | |
class | CudaStreamCaptureIsolationException |
Exception corresponding to cudaErrorStreamCaptureIsolation of cudaError . More... | |
class | CudaStreamCaptureMergeException |
Exception corresponding to cudaErrorStreamCaptureMerge of cudaError . More... | |
class | CudaStreamCaptureUnjoinedException |
Exception corresponding to cudaErrorStreamCaptureUnjoined of cudaError . More... | |
class | CudaStreamCaptureUnmatchedException |
Exception corresponding to cudaErrorStreamCaptureUnmatched of cudaError . More... | |
class | CudaStreamCaptureUnsupportedException |
Exception corresponding to cudaErrorStreamCaptureUnsupported of cudaError . More... | |
class | CudaStreamCaptureWrongThreadException |
Exception corresponding to cudaErrorStreamCaptureWrongThread of cudaError . More... | |
class | CudaSymbolNotFoundException |
Exception corresponding to cudaErrorSymbolNotFound of cudaError . More... | |
class | CudaSyncDepthExceededException |
Exception corresponding to cudaErrorSyncDepthExceeded of cudaError . More... | |
class | CudaSynchronizationErrorException |
Exception corresponding to cudaErrorSynchronizationError of cudaError . More... | |
class | CudaSystemDriverMismatchException |
Exception corresponding to cudaErrorSystemDriverMismatch of cudaError . More... | |
class | CudaSystemNotReadyException |
Exception corresponding to cudaErrorSystemNotReady of cudaError . More... | |
class | CudaTextureFetchFailedException |
Exception corresponding to cudaErrorTextureFetchFailed of cudaError . More... | |
class | CudaTextureNotBoundException |
Exception corresponding to cudaErrorTextureNotBound of cudaError . More... | |
class | CudaTimeoutException |
Exception corresponding to cudaErrorTimeout of cudaError . More... | |
class | CudaTooManyPeersException |
Exception corresponding to cudaErrorTooManyPeers of cudaError . More... | |
class | CudaUnknownException |
Exception corresponding to cudaErrorUnknown of cudaError . More... | |
class | CudaUnmapBufferObjectFailedException |
Exception corresponding to cudaErrorUnmapBufferObjectFailed of cudaError . More... | |
class | CudaUnsupportedLimitException |
Exception corresponding to cudaErrorUnsupportedLimit of cudaError . More... | |
class | Device |
Represents a GPU. More... | |
class | Event |
Represents an event in a compute stream. More... | |
struct | get_addrspace |
Get the address space of a type. More... | |
class | Host |
Represents the host. More... | |
class | RAIIObject |
A generic mechanism for giving RAII semantics to C-style APIs. More... | |
struct | remove_addrspace |
Remove address space qualifiers from a type. More... | |
struct | remove_cva |
Remove const, volatile, and address space from a type. More... | |
struct | remove_cvaref |
Remove reference const, volatile, and address space from a type. More... | |
struct | remove_cvref |
This is part of C++20: http://en.cppreference.com/w/cpp/types/remove_cvref. More... | |
struct | Shfl |
Shuffler corresponding to shfl() . The offset parameter behaves like shfl() 's srcLane argument. More... | |
struct | ShflDown |
Shuffler corresponding to shfl_down() . The offset parameter behaves like shfl_down() 's delta argument. More... | |
struct | ShflUp |
Shuffler corresponding to shfl_up() . The offset parameter behaves like shfl_up() 's delta argument. More... | |
struct | ShflXor |
Shuffler corresponding to shfl_xor() . The offset parameter behaves like shfl_xor() 's laneMask argument. More... | |
class | Stream |
Represents a CUDA stream. More... | |
Typedefs | |
template<typename T , AddressSpace AS> | |
using | add_addrspace_t = typename add_addrspace< T, AS >::type |
Add an address space qualifier to a type. More... | |
template<typename T > | |
using | remove_addrspace_t = typename remove_addrspace< T >::type |
Remove address space qualifiers from a type. More... | |
template<typename T > | |
using | remove_cva_t = typename remove_cva< T >::type |
Remove const, volatile, and address space from a type. More... | |
template<typename T > | |
using | remove_cvaref_t = typename remove_cvaref< T >::type |
Remove reference const, volatile, and address space from a type. More... | |
template<cudaError_t Code> | |
using | CudaExceptionFor = typename CudaExceptionForImpl< Code >::type |
Type alias that gets you the exception type corresponding to a specific CUDA error code. More... | |
template<typename T > | |
using | remove_cvref_t = typename remove_cvref< T >::type |
template<typename T , typename Q > | |
using | copy_cvref_t = typename copy_cvref< T, Q >::type |
template<typename T > | |
using | allocatable_type_t = typename allocatable_type< T >::type |
template<typename T , typename Q = T> | |
using | UniquePtr = std::unique_ptr< T, std::function< void(Q *)> > |
Handy type alias for std::unique_ptr s that have simple custom deleters. More... | |
Functions | |
__device__ int | atomicAdd (__device int *addr, int val) |
Atomically add val to the value stored at AS memory location addr , returning the original value. More... | |
__device__ unsigned int | atomicAdd (__device unsigned int *addr, unsigned int val) |
Atomically add val to the value stored at AS memory location addr , returning the original value. More... | |
__device__ unsigned long long | atomicAdd (__device unsigned long long *addr, unsigned long long val) |
Atomically add val to the value stored at AS memory location addr , returning the original value. More... | |
__device__ float | atomicAdd (__device float *addr, float val) |
Atomically add val to the value stored at AS memory location addr , returning the original value. More... | |
__device__ double | atomicAdd (__device double *addr, double val) |
Atomically add val to the value stored at AS memory location addr , returning the original value. More... | |
__device__ int | atomicSub (__device int *addr, int val) |
Atomically subtract val from the value stored at AS memory location addr , returning the original value. More... | |
__device__ unsigned int | atomicSub (__device unsigned int *addr, unsigned int val) |
Atomically subtract val from the value stored at AS memory location addr , returning the original value. More... | |
__device__ unsigned long long | atomicSub (__device unsigned long long *addr, unsigned long long val) |
Atomically subtract val from the value stored at AS memory location addr , returning the original value. More... | |
__device__ float | atomicSub (__device float *addr, float val) |
Atomically subtract val from the value stored at AS memory location addr , returning the original value. More... | |
__device__ double | atomicSub (__device double *addr, double val) |
Atomically subtract val from the value stored at AS memory location addr , returning the original value. More... | |
__device__ int | atomicExch (__device int *addr, int val) |
Atomically write val to addr and return the value that was stored there before calling this function. More... | |
__device__ unsigned int | atomicExch (__device unsigned int *addr, unsigned int val) |
Atomically write val to addr and return the value that was stored there before calling this function. More... | |
__device__ unsigned long long | atomicExch (__device unsigned long long *addr, unsigned long long val) |
Atomically write val to addr and return the value that was stored there before calling this function. More... | |
__device__ float | atomicExch (__device float *addr, float val) |
Atomically write val to addr and return the value that was stored there before calling this function. More... | |
__device__ int | atomicMin (__device int *addr, int val) |
Atomically write the min of *addr and val to addr , returning the original value of *addr . More... | |
__device__ int | atomicMax (__device int *addr, int val) |
Atomically write the max of *addr and val to addr , returning the original value of *addr . More... | |
__device__ unsigned int | atomicMin (__device unsigned int *addr, unsigned int val) |
Atomically write the min of *addr and val to addr , returning the original value of *addr . More... | |
__device__ unsigned int | atomicMax (__device unsigned int *addr, unsigned int val) |
Atomically write the max of *addr and val to addr , returning the original value of *addr . More... | |
__device__ unsigned long long | atomicMin (__device unsigned long long *addr, unsigned long long val) |
Atomically write the min of *addr and val to addr , returning the original value of *addr . More... | |
__device__ unsigned long long | atomicMax (__device unsigned long long *addr, unsigned long long val) |
Atomically write the max of *addr and val to addr , returning the original value of *addr . More... | |
__device__ int | atomicCAS (__device int *addr, int cmp, int val) |
__device__ unsigned int | atomicCAS (__device unsigned int *addr, unsigned int cmp, unsigned int val) |
__device__ unsigned long long | atomicCAS (__device unsigned long long *addr, unsigned long long cmp, unsigned long long val) |
__device__ int | atomicAnd (__device int *addr, int val) |
Atomically compute *addr = *addr & val and return the original value of *addr . More... | |
__device__ int | atomicOr (__device int *addr, int val) |
Atomically compute *addr = *addr | val and return the original value of *addr . More... | |
__device__ int | atomicXor (__device int *addr, int val) |
Atomically compute *addr = *addr ^ val and return the original value of *addr . More... | |
__device__ unsigned int | atomicAnd (__device unsigned int *addr, unsigned int val) |
Atomically compute *addr = *addr & val and return the original value of *addr . More... | |
__device__ unsigned int | atomicOr (__device unsigned int *addr, unsigned int val) |
Atomically compute *addr = *addr | val and return the original value of *addr . More... | |
__device__ unsigned int | atomicXor (__device unsigned int *addr, unsigned int val) |
Atomically compute *addr = *addr ^ val and return the original value of *addr . More... | |
__device__ unsigned long long | atomicAnd (__device unsigned long long *addr, unsigned long long val) |
Atomically compute *addr = *addr & val and return the original value of *addr . More... | |
__device__ unsigned long long | atomicOr (__device unsigned long long *addr, unsigned long long val) |
Atomically compute *addr = *addr | val and return the original value of *addr . More... | |
__device__ unsigned long long | atomicXor (__device unsigned long long *addr, unsigned long long val) |
Atomically compute *addr = *addr ^ val and return the original value of *addr . More... | |
__device__ unsigned int | atomicInc (__device unsigned int *addr, unsigned int val) |
__device__ unsigned int | atomicDec (__device unsigned int *addr, unsigned int val) |
template<typename T > | |
__device__ T | tex1Dfetch (cudaTextureObject_t tex, int x) |
template<typename T > | |
__device__ T | tex2Dfetch (cudaTextureObject_t tex, int x, int y) |
Read from a 2D texture using integer coordinates. (API extension) More... | |
template<typename T > | |
__device__ T | tex3Dfetch (cudaTextureObject_t tex, int x, int y, int z) |
Read from a 3D texture using integer coordinates. (API extension) More... | |
template<typename T > | |
__device__ T | tex1DOffsetfetch (cudaTextureObject_t tex, int x, int xO) |
Read from a 1D texture at an integer offset with offset addressing (API extension) More... | |
template<typename T > | |
__device__ T | tex2DOffsetfetch (cudaTextureObject_t tex, int x, int y, int xO, int yO) |
Read from a 2D texture using integer coordinates with offset addressing. (API extension) More... | |
template<typename T > | |
__device__ T | tex3DOffsetfetch (cudaTextureObject_t tex, int x, int y, int z, int xO, int yO, int zO) |
Read from a 3D texture using integer coordinates. with offset addressing (API extension) More... | |
template<typename T > | |
__device__ T | tex1D (cudaTextureObject_t tex, float x) |
Read from a 1D texture at a floating-point offset. More... | |
template<typename T > | |
__device__ T | tex2D (cudaTextureObject_t tex, float x, float y) |
Read from a 2D texture using floating-point coordinates. More... | |
template<typename T > | |
__device__ T | tex3D (cudaTextureObject_t tex, float x, float y, float z) |
Read from a 3D texture using floating-point coordinates. More... | |
__device__ int | getTexWidth (cudaTextureObject_t tex) |
Query the width of a texture object. More... | |
__device__ int | getTexHeight (cudaTextureObject_t tex) |
Query the height of a texture object. More... | |
__device__ int | getTexDepth (cudaTextureObject_t tex) |
Query the depth of a texture object. More... | |
template<> | |
__device__ float4 | tex1Dfetch (cudaTextureObject_t tex, int x) |
template<> | |
__device__ float4 | tex1DOffsetfetch (cudaTextureObject_t tex, int x, int xO) |
Read from a 1D texture at an integer offset with offset addressing (API extension) More... | |
template<> | |
__device__ float4 | tex2Dfetch (cudaTextureObject_t tex, int x, int y) |
Read from a 2D texture using integer coordinates. (API extension) More... | |
template<> | |
__device__ float4 | tex2DOffsetfetch (cudaTextureObject_t tex, int x, int y, int xO, int yO) |
Read from a 2D texture using integer coordinates with offset addressing. (API extension) More... | |
template<> | |
__device__ float4 | tex3Dfetch (cudaTextureObject_t tex, int x, int y, int z) |
Read from a 3D texture using integer coordinates. (API extension) More... | |
template<> | |
__device__ float4 | tex3DOffsetfetch (cudaTextureObject_t tex, int x, int y, int z, int xO, int yO, int zO) |
Read from a 3D texture using integer coordinates. with offset addressing (API extension) More... | |
template<> | |
__device__ float4 | tex1D (cudaTextureObject_t tex, float x) |
Read from a 1D texture at a floating-point offset. More... | |
template<> | |
__device__ float4 | tex2D (cudaTextureObject_t tex, float x, float y) |
Read from a 2D texture using floating-point coordinates. More... | |
template<> | |
__device__ float4 | tex3D (cudaTextureObject_t tex, float x, float y, float z) |
Read from a 3D texture using floating-point coordinates. More... | |
__device__ lanemask_t | lanemaskLt (int laneID) |
Get a bitmask with a 1 in every position lower than this thread's lane ID. More... | |
__device__ lanemask_t | lanemaskLe (int laneID) |
Get a bitmask with a 1 in every position lower than or equal to this thread's lane ID. More... | |
__device__ lanemask_t | lanemaskEq (int laneID) |
Get a bitmask with a 1 only in the position equal to this thread's lane ID. More... | |
__device__ lanemask_t | lanemaskGe (int laneID) |
Get a bitmask with a 1 in every position greater than this thread's lane ID. More... | |
__device__ lanemask_t | lanemaskGt (int laneID) |
Get a bitmask with a 1 in every position greater than or equal to this thread's lane ID. More... | |
template<typename Shuffler = Shfl, typename T > | |
__device__ auto | shuffle (T value, int offset, int logicalWarpSize=WARP_SIZE) |
Generic shuffle. More... | |
template<typename Shuffler = Shfl, typename T > | |
__device__ std::pair< bool, T > | shufflePredicated (T value, int offset, int logicalWarpSize=WARP_SIZE, int laneID=0) |
Like shuffle() , but also yields a boolean indicating if the value that was read is valid. More... | |
__device__ void | syncthreads (int barrierID, int numWarps) |
A more powerful __syncthreads() More... | |
__device__ void | syncthreads (int barrierID) |
Like the other sp::syncthreads() , but implicitly synchronises all non-exited warps in the block. More... | |
__device__ void | syncthreads_arrive (int barrierID, int numWarps) |
Functions exactly like sp::syncthreads() , but this warp does not block. More... | |
__device__ int | syncthreads_count (int barrierID, int numWarps, bool predicate) |
Like sp::syncthreads() , but also returns a count of how many threads passed true for predicate . More... | |
__device__ int | syncthreads_count (int barrierID, bool predicate) |
sp::syncthreads_count() , implicitly applied to all non-exited warps. More... | |
__device__ bool | syncthreads_and (int barrierID, int numWarps, bool predicate) |
Like sp::syncthreads() , but also returns true iff all participating threads passed true for predicate . More... | |
__device__ bool | syncthreads_and (int barrierID, bool predicate) |
sp::syncthreads_and() , implicitly applied to all non-exited warps. More... | |
__device__ bool | syncthreads_or (int barrierID, int numWarps, bool predicate) |
Like sp::syncthreads() , but also returns true iff any participating threads passed true for predicate . More... | |
__device__ bool | syncthreads_or (int barrierID, bool predicate) |
sp::syncthreads_or() , implicitly applied to all non-exited warps. More... | |
void | throwCudaException (cudaError_t c, std::string desc="") |
Throw a cudaError_t as an exception, unless it's cudaSuccess . More... | |
template<typename ExceptionType , typename T > | |
void | throwIfNull (T *ptr, std::string message) |
Nifty utility function for throwing exceptions based on null pointer checks. More... | |
template<typename... Args> | |
constexpr void | reallyDoAssert (const char *fileName, int line, const char *functionName, bool passed, const char *checkString, const char *message, Args &&... args) |
template<bool Enabled, typename... Args> | |
constexpr void | doAssert (Args &&... args) |
template<typename T > | |
void | nopDeleter (T *) |
A deleter that doesn't actually delete anything. More... | |
template<typename T > | |
void | pinnedMemoryDeleter (T *p) |
Deleter for pinned host memory. More... | |
template<typename T > | |
void | deviceMemoryDeleter (__device T *p) |
Deleter for device memory. More... | |
bool | isDevicePointer (const void *ptr) |
Runtime check if a flat-address-space pointer is a pointer to any GPU memory. More... | |
void | throwIfNotDevicePointer (const void *ptr, std::string name) |
void | throwIfDevicePointer (const void *ptr, std::string name) |
template<typename T , typename Q > | |
bool | operator== (const Managed_Allocator< T > &, const Managed_Allocator< Q > &) |
template<typename T , typename Q > | |
bool | operator!= (const Managed_Allocator< T > &lhs, const Managed_Allocator< Q > &rhs) |
template<typename T , typename U , typename Deleter > | |
std::unique_ptr< T, Deleter > | static_pointer_cast (std::unique_ptr< U, Deleter > &r) noexcept |
Perform a static_cast operation on an std::unique_ptr . More... | |
template<typename T , typename U , typename Deleter > | |
std::unique_ptr< T, Deleter > | dynamic_pointer_cast (std::unique_ptr< U, Deleter > &r) noexcept |
Like static_pointer_cast , but performs a dynamic_cast() More... | |
template<typename T , typename U , typename Deleter > | |
std::unique_ptr< T, Deleter > | const_pointer_cast (std::unique_ptr< U, Deleter > &r) noexcept |
Like static_pointer_cast , but performs a const_cast() More... | |
template<typename T , typename U , typename Deleter > | |
std::unique_ptr< T, Deleter > | reinterpret_pointer_cast (std::unique_ptr< U, Deleter > &r) noexcept |
Like static_pointer_cast , but performs a reinterpret_cast() More... | |
Variables | |
template<typename T > | |
constexpr sp::AddressSpace | get_addrspace_v = get_addrspace<T>::value |
Get the address space of a type. More... | |
constexpr bool | OnDevice = false |
constexpr int | CudaVersion = 99999 |
constexpr bool | GPUIntegrated = false |
constexpr bool | EmulateLanemaskIntrinsics = IsAmd |
Feature selectors. ///. More... | |
constexpr bool | HasExtendedSyncthreads = !IsAmd |
If true, sub-block thread synchronisation is possible using sp::syncthreads(). More... | |
constexpr bool | IsWindows = false |
constexpr bool | AssertionsEnabled = false |
constexpr const char * | UNKNOWN_FN = "(UNKOWN FUNCTION)" |
constexpr const char * | UNKNOWN_FILE = "(UNKOWN FILE)" |
Namespace with Spectral Compute Ltd things.
enum sp::DeviceAttr |
This enum does the same job as cudaDeviceAttr
, but adds fields that correspond to all fixed fields in cudaDeviceProp
which do not have a field in cudaDeviceAttr
.
This way, the enum represents a complete list of properties that are compile-time-constants. Notably, totalGlobalMem is missing from nvidia's enum.
Enumerator | |
---|---|
MaxThreadsPerBlock | Maximum number of threads per block. |
MaxBlockDimX | Maximum block dimension X. |
MaxBlockDimY | Maximum block dimension Y. |
MaxBlockDimZ | Maximum block dimension Z. |
MaxGridDimX | Maximum grid dimension X. |
MaxGridDimY | Maximum grid dimension Y. |
MaxGridDimZ | Maximum grid dimension Z. |
MaxSharedMemoryPerBlock | Maximum shared memory available per block in bytes. |
TotalConstantMemory | Memory available on device for constant variables in a CUDA C kernel in bytes. |
WarpSize | Warp size in threads. |
MaxPitch | Maximum pitch in bytes allowed by memory copies. |
MaxRegistersPerBlock | Maximum number of 32-bit registers available per block. |
ClockRate | Peak clock frequency in kilohertz. |
TextureAlignment | Alignment requirement for textures. |
GpuOverlap | Device can possibly copy memory and execute a kernel concurrently. |
MultiProcessorCount | Number of multiprocessors on device. |
KernelExecTimeout | Specifies whether there is a run time limit on kernels. |
Integrated | Device is integrated with host memory. |
CanMapHostMemory | Device can map host memory into CUDA address space. |
ComputeMode | Compute mode (See cudaComputeMode for details) |
MaxTexture1DWidth | Maximum 1D texture width. |
MaxTexture2DWidth | Maximum 2D texture width. |
MaxTexture2DHeight | Maximum 2D texture height. |
MaxTexture3DWidth | Maximum 3D texture width. |
MaxTexture3DHeight | Maximum 3D texture height. |
MaxTexture3DDepth | Maximum 3D texture depth. |
MaxTexture2DLayeredWidth | Maximum 2D layered texture width. |
MaxTexture2DLayeredHeight | Maximum 2D layered texture height. |
MaxTexture2DLayeredLayers | Maximum layers in a 2D layered texture. |
SurfaceAlignment | Alignment requirement for surfaces. |
ConcurrentKernels | Device can possibly execute multiple kernels concurrently. |
EccEnabled | Device has ECC support enabled. |
PciBusId | PCI bus ID of the device. |
PciDeviceId | PCI device ID of the device. |
TccDriver | Device is using TCC driver model. |
MemoryClockRate | Peak memory clock frequency in kilohertz. |
GlobalMemoryBusWidth | Global memory bus width in bits. |
L2CacheSize | Size of L2 cache in bytes. |
MaxThreadsPerMultiProcessor | Maximum resident threads per multiprocessor. |
AsyncEngineCount | Number of asynchronous engines. |
UnifiedAddressing | Device shares a unified address space with the host. |
MaxTexture1DLayeredWidth | Maximum 1D layered texture width. |
MaxTexture1DLayeredLayers | Maximum layers in a 1D layered texture. |
MaxTexture2DGatherWidth | Maximum 2D texture width if cudaArrayTextureGather is set. |
MaxTexture2DGatherHeight | Maximum 2D texture height if cudaArrayTextureGather is set. |
MaxTexture3DWidthAlt | Alternate maximum 3D texture width. |
MaxTexture3DHeightAlt | Alternate maximum 3D texture height. |
MaxTexture3DDepthAlt | Alternate maximum 3D texture depth. |
PciDomainId | PCI domain ID of the device. |
TexturePitchAlignment | Pitch alignment requirement for textures. |
MaxTextureCubemapWidth | Maximum cubemap texture width/height. |
MaxTextureCubemapLayeredWidth | Maximum cubemap layered texture width/height. |
MaxTextureCubemapLayeredLayers | Maximum layers in a cubemap layered texture. |
MaxSurface1DWidth | Maximum 1D surface width. |
MaxSurface2DWidth | Maximum 2D surface width. |
MaxSurface2DHeight | Maximum 2D surface height. |
MaxSurface3DWidth | Maximum 3D surface width. |
MaxSurface3DHeight | Maximum 3D surface height. |
MaxSurface3DDepth | Maximum 3D surface depth. |
MaxSurface1DLayeredWidth | Maximum 1D layered surface width. |
MaxSurface1DLayeredLayers | Maximum layers in a 1D layered surface. |
MaxSurface2DLayeredWidth | Maximum 2D layered surface width. |
MaxSurface2DLayeredHeight | Maximum 2D layered surface height. |
MaxSurface2DLayeredLayers | Maximum layers in a 2D layered surface. |
MaxSurfaceCubemapWidth | Maximum cubemap surface width. |
MaxSurfaceCubemapLayeredWidth | Maximum cubemap layered surface width. |
MaxSurfaceCubemapLayeredLayers | Maximum layers in a cubemap layered surface. |
MaxTexture1DLinearWidth | Maximum 1D linear texture width. |
MaxTexture2DLinearWidth | Maximum 2D linear texture width. |
MaxTexture2DLinearHeight | Maximum 2D linear texture height. |
MaxTexture2DLinearPitch | Maximum 2D linear texture pitch in bytes. |
MaxTexture2DMipmappedWidth | Maximum mipmapped 2D texture width. |
MaxTexture2DMipmappedHeight | Maximum mipmapped 2D texture height. |
ComputeCapabilityMajor | Major compute capability version number. |
ComputeCapabilityMinor | Minor compute capability version number. |
MaxTexture1DMipmappedWidth | Maximum mipmapped 1D texture width. |
StreamPrioritiesSupported | Device supports stream priorities. |
GlobalL1CacheSupported | Device supports caching globals in L1. |
LocalL1CacheSupported | Device supports caching locals in L1. |
MaxSharedMemoryPerMultiprocessor | Maximum shared memory available per multiprocessor in bytes. |
MaxRegistersPerMultiprocessor | Maximum number of 32-bit registers available per multiprocessor. |
ManagedMemory | Device can allocate managed memory on this system. |
IsMultiGpuBoard | Device is on a multi-GPU board. |
MultiGpuBoardGroupID | Unique identifier for a group of devices on the same multi-GPU board. |
HostNativeAtomicSupported | Link between the device and the host supports native atomic operations. |
SingleToDoublePrecisionPerfRatio | Ratio of single precision performance (in floating-point operations per second) to double precision performance. |
PageableMemoryAccess | Device supports coherently accessing pageable memory without calling cudaHostRegister on it. |
ConcurrentManagedAccess | Device can coherently access managed memory concurrently with the CPU. |
ComputePreemptionSupported | Device supports Compute Preemption. |
CanUseHostPointerForRegisteredMem | Device can access host registered memory at the same virtual address as the CPU. |
CooperativeLaunch | Device supports launching cooperative kernels via cudaLaunchCooperativeKernel. |
CooperativeMultiDeviceLaunch | Deprecated, cudaLaunchCooperativeKernelMultiDevice is deprecated. |
MaxSharedMemoryPerBlockOptin | The maximum optin shared memory per block. This value may vary by chip. See cudaFuncSetAttribute |
CanFlushRemoteWrites | Device supports flushing of outstanding remote writes. |
HostRegisterSupported | Device supports host memory registration via cudaHostRegister. |
PageableMemoryAccessUsesHostPageTables | Device accesses pageable memory via the host's page tables. |
DirectManagedMemAccessFromHost | Host can directly access managed memory on the device without migration. |
MaxBlocksPerMultiprocessor | Maximum number of blocks per multiprocessor. |
MaxPersistingL2CacheSize | Maximum L2 persisting lines capacity setting in bytes. |
MaxAccessPolicyWindowSize | Maximum value of cudaAccessPolicyWindow::num_bytes. |
ReservedSharedMemoryPerBlock | Shared memory reserved by CUDA driver per block in bytes. |
SparseCudaArraySupported | Device supports sparse CUDA arrays and sparse CUDA mipmapped arrays. |
HostRegisterReadOnlySupported | Device supports using the cudaHostRegister flag cudaHostRegisterReadOnly to register memory that must be mapped as read-only to the GPU. |
TimelineSemaphoreInteropSupported | External timeline semaphore interop is supported on the device. |
MaxTimelineSemaphoreInteropSupported | Deprecated, External timeline semaphore interop is supported on the device. |
MemoryPoolsSupported | Device supports using the cudaMallocAsync and cudaMemPool_t family of APIs. |
GPUDirectRDMASupported | Device supports GPUDirect RDMA APIs, like nvidia_p2p_get_pages (see https://docs.nvidia.com/cuda/gpudirect-rdma for more information) |
GPUDirectRDMAFlushWritesOptions | The returned attribute shall be interpreted as a bitmask, where the individual bits are listed in the cudaFlushGPUDirectRDMAWritesOptions enum. |
GPUDirectRDMAWritesOrdering | GPUDirect RDMA writes to the device do not need to be flushed for consumers within the scope indicated by the returned attribute. See cudaGPUDirectRDMAWritesOrdering for the numerical values returned here. |
MemoryPoolSupportedHandleTypes | Handle types supported with mempool based IPC. |
TotalGlobalMem | Global memory size in bytes. |
bool sp::isDevicePointer | ( | const void * | ptr | ) |
Runtime check if a flat-address-space pointer is a pointer to any GPU memory.
This function is quite slow, and you should be able to avoid using it by making use of address space annotation on pointer types. It is provided mostly for ease of migration to statically-typed address spacing, and for interoperability with other CUDA libraries (which may use generic pointers).
void sp::throwIfNull | ( | T * | ptr, |
std::string | message | ||
) |
Nifty utility function for throwing exceptions based on null pointer checks.
|
constexpr |
Feature selectors. ///.
|
constexpr |
If true, sub-block thread synchronisation is possible using sp::syncthreads().
If false, the first argument of that function is ignored and always treated as if it were zero.