libSCALE  0.2.0
A modern C++ CUDA API
sp Namespace Reference

Namespace with Spectral Compute Ltd things. More...

Classes

struct  add_addrspace
 Add an address space qualifier to a type. More...
 
struct  allocatable_type
 The type that you need to allocate for T. Always T, except for void, which yields char. More...
 
class  BlockingEvent
 An event that the host can synchronise with. More...
 
struct  copy_cvref
 T, with the cvref qualifiers of Q. More...
 
class  CudaAddressOfConstantException
 Exception corresponding to cudaErrorAddressOfConstant of cudaError. More...
 
class  CudaAlreadyAcquiredException
 Exception corresponding to cudaErrorAlreadyAcquired of cudaError. More...
 
class  CudaAlreadyMappedException
 Exception corresponding to cudaErrorAlreadyMapped of cudaError. More...
 
class  CudaArrayIsMappedException
 Exception corresponding to cudaErrorArrayIsMapped of cudaError. More...
 
class  CudaAssertException
 Exception corresponding to cudaErrorAssert of cudaError. More...
 
class  CudaCapturedEventException
 Exception corresponding to cudaErrorCapturedEvent of cudaError. More...
 
class  CudaCompatNotSupportedOnDeviceException
 Exception corresponding to cudaErrorCompatNotSupportedOnDevice of cudaError. More...
 
class  CudaContextIsDestroyedException
 Exception corresponding to cudaErrorContextIsDestroyed of cudaError. More...
 
class  CudaCooperativeLaunchTooLargeException
 Exception corresponding to cudaErrorCooperativeLaunchTooLarge of cudaError. More...
 
class  CudaDeviceAlreadyInUseException
 Exception corresponding to cudaErrorDeviceAlreadyInUse of cudaError. More...
 
class  CudaDevicesUnavailableException
 Exception corresponding to cudaErrorDevicesUnavailable of cudaError. More...
 
class  CudaDeviceUninitializedException
 Exception corresponding to cudaErrorDeviceUninitialized of cudaError. More...
 
class  CudaDuplicateSurfaceNameException
 Exception corresponding to cudaErrorDuplicateSurfaceName of cudaError. More...
 
class  CudaDuplicateTextureNameException
 Exception corresponding to cudaErrorDuplicateTextureName of cudaError. More...
 
class  CudaDuplicateVariableNameException
 Exception corresponding to cudaErrorDuplicateVariableName of cudaError. More...
 
class  CudaECCUncorrectableException
 Exception corresponding to cudaErrorECCUncorrectable of cudaError. More...
 
class  CudaErrorCallRequiresNewerDriverException
 Exception corresponding to cudaErrorCallRequiresNewerDriver of cudaError. More...
 
class  CudaErrorDeviceNotLicensedException
 Exception corresponding to cudaErrorDeviceNotLicensed of cudaError. More...
 
class  CudaErrorExternalDeviceException
 Exception corresponding to cudaErrorExternalDevice of cudaError. More...
 
class  CudaErrorJitCompilationDisabledException
 Exception corresponding to cudaErrorJitCompilationDisabled of cudaError. More...
 
class  CudaErrorMpsConnectionFailedException
 Exception corresponding to cudaErrorMpsConnectionFailed of cudaError. More...
 
class  CudaErrorMpsMaxClientsReachedException
 Exception corresponding to cudaErrorMpsMaxClientsReached of cudaError. More...
 
class  CudaErrorMpsMaxConnectionsReachedException
 Exception corresponding to cudaErrorMpsMaxConnectionsReached of cudaError. More...
 
class  CudaErrorMpsRpcFailureException
 Exception corresponding to cudaErrorMpsRpcFailure of cudaError. More...
 
class  CudaErrorMpsServerNotReadyException
 Exception corresponding to cudaErrorMpsServerNotReady of cudaError. More...
 
class  CudaErrorSoftwareValidityNotEstablishedException
 Exception corresponding to cudaErrorSoftwareValidityNotEstablished of cudaError. More...
 
class  CudaErrorStubLibraryException
 Exception corresponding to cudaErrorStubLibrary of cudaError. More...
 
class  CudaErrorUnsupportedExecAffinityException
 Exception corresponding to cudaErrorUnsupportedExecAffinity of cudaError. More...
 
class  CudaErrorUnsupportedPtxVersionException
 Exception corresponding to cudaErrorUnsupportedPtxVersion of cudaError. More...
 
class  CudaException
 Base class for exception types that wrap CUDA error codes. More...
 
class  CudaFileNotFoundException
 Exception corresponding to cudaErrorFileNotFound of cudaError. More...
 
class  CudaGraphExecUpdateFailureException
 Exception corresponding to cudaErrorGraphExecUpdateFailure of cudaError. More...
 
class  CudaHardwareStackErrorException
 Exception corresponding to cudaErrorHardwareStackError of cudaError. More...
 
class  CudaHostMemoryAlreadyRegisteredException
 Exception corresponding to cudaErrorHostMemoryAlreadyRegistered of cudaError. More...
 
class  CudaHostMemoryNotRegisteredException
 Exception corresponding to cudaErrorHostMemoryNotRegistered of cudaError. More...
 
class  CudaIllegalAddressException
 Exception corresponding to cudaErrorIllegalAddress of cudaError. More...
 
class  CudaIllegalStateException
 Exception corresponding to cudaErrorIllegalState of cudaError. More...
 
class  CudaIncompatibleDriverContextException
 Exception corresponding to cudaErrorIncompatibleDriverContext of cudaError. More...
 
class  CudaInitializationErrorException
 Exception corresponding to cudaErrorInitializationError of cudaError. More...
 
class  CudaInstructionFailureException
 Exception corresponding to cudaErrorIllegalInstruction of cudaError. More...
 
class  CudaInsufficientDriverException
 Exception corresponding to cudaErrorInsufficientDriver of cudaError. More...
 
class  CudaInvalidAddressSpaceException
 Exception corresponding to cudaErrorInvalidAddressSpace of cudaError. More...
 
class  CudaInvalidChannelDescriptorException
 Exception corresponding to cudaErrorInvalidChannelDescriptor of cudaError. More...
 
class  CudaInvalidConfigurationException
 Exception corresponding to cudaErrorInvalidConfiguration of cudaError. More...
 
class  CudaInvalidDeviceException
 Exception corresponding to cudaErrorInvalidDevice of cudaError. More...
 
class  CudaInvalidDeviceFunctionException
 Exception corresponding to cudaErrorInvalidDeviceFunction of cudaError. More...
 
class  CudaInvalidDevicePointerException
 Exception corresponding to cudaErrorInvalidDevicePointer of cudaError. More...
 
class  CudaInvalidFilterSettingException
 Exception corresponding to cudaErrorInvalidFilterSetting of cudaError. More...
 
class  CudaInvalidGraphicsContextException
 Exception corresponding to cudaErrorInvalidGraphicsContext of cudaError. More...
 
class  CudaInvalidHostPointerException
 Exception corresponding to cudaErrorInvalidHostPointer of cudaError. More...
 
class  CudaInvalidKernelImageException
 Exception corresponding to cudaErrorInvalidKernelImage of cudaError. More...
 
class  CudaInvalidMemcpyDirectionException
 Exception corresponding to cudaErrorInvalidMemcpyDirection of cudaError. More...
 
class  CudaInvalidNormSettingException
 Exception corresponding to cudaErrorInvalidNormSetting of cudaError. More...
 
class  CudaInvalidPcException
 Exception corresponding to cudaErrorInvalidPc of cudaError. More...
 
class  CudaInvalidPitchValueException
 Exception corresponding to cudaErrorInvalidPitchValue of cudaError. More...
 
class  CudaInvalidPtxException
 Exception corresponding to cudaErrorInvalidPtx of cudaError. More...
 
class  CudaInvalidResourceHandleException
 Exception corresponding to cudaErrorInvalidResourceHandle of cudaError. More...
 
class  CudaInvalidSourceException
 Exception corresponding to cudaErrorInvalidSource of cudaError. More...
 
class  CudaInvalidSurfaceException
 Exception corresponding to cudaErrorInvalidSurface of cudaError. More...
 
class  CudaInvalidSymbolException
 Exception corresponding to cudaErrorInvalidSymbol of cudaError. More...
 
class  CudaInvalidTextureBindingException
 Exception corresponding to cudaErrorInvalidTextureBinding of cudaError. More...
 
class  CudaInvalidTextureException
 Exception corresponding to cudaErrorInvalidTexture of cudaError. More...
 
class  CudaInvalidValueException
 Exception corresponding to cudaErrorInvalidValue of cudaError. More...
 
class  CudaJitCompilerNotFoundException
 Exception corresponding to cudaErrorJitCompilerNotFound of cudaError. More...
 
class  CudaKernel
 Object representing a kernel. More...
 
class  CudaLaunchFailureException
 Exception corresponding to cudaErrorLaunchFailure of cudaError. More...
 
class  CudaLaunchFileScopedSurfException
 Exception corresponding to cudaErrorLaunchFileScopedSurf of cudaError. More...
 
class  CudaLaunchFileScopedTexException
 Exception corresponding to cudaErrorLaunchFileScopedTex of cudaError. More...
 
class  CudaLaunchIncompatibleTexturingException
 Exception corresponding to cudaErrorLaunchIncompatibleTexturing of cudaError. More...
 
class  CudaLaunchMaxDepthExceededException
 Exception corresponding to cudaErrorLaunchMaxDepthExceeded of cudaError. More...
 
class  CudaLaunchOutOfResourcesException
 Exception corresponding to cudaErrorLaunchOutOfResources of cudaError. More...
 
class  CudaLaunchPendingCountExceededException
 Exception corresponding to cudaErrorLaunchPendingCountExceeded of cudaError. More...
 
class  CudaLaunchTimeoutException
 Exception corresponding to cudaErrorLaunchTimeout of cudaError. More...
 
class  CudaMapBufferObjectFailedException
 Exception corresponding to cudaErrorMapBufferObjectFailed of cudaError. More...
 
class  CudaMemoryAllocationException
 Exception corresponding to cudaErrorMemoryAllocation of cudaError. More...
 
class  CudaMemoryValueTooLargeException
 Exception corresponding to cudaErrorMemoryValueTooLarge of cudaError. More...
 
class  CudaMisalignedAddressException
 Exception corresponding to cudaErrorMisalignedAddress of cudaError. More...
 
class  CudaMissingConfigurationException
 Exception corresponding to cudaErrorMissingConfiguration of cudaError. More...
 
class  CudaMixedDeviceExecutionException
 Exception corresponding to cudaErrorMixedDeviceExecution of cudaError. More...
 
class  CudaNoDeviceException
 Exception corresponding to cudaErrorNoDevice of cudaError. More...
 
class  CudaNoKernelImageForDeviceException
 Exception corresponding to cudaErrorNoKernelImageForDevice of cudaError. More...
 
class  CudaNotMappedAsArrayException
 Exception corresponding to cudaErrorNotMappedAsArray of cudaError. More...
 
class  CudaNotMappedAsPointerException
 Exception corresponding to cudaErrorNotMappedAsPointer of cudaError. More...
 
class  CudaNotMappedException
 Exception corresponding to cudaErrorNotMapped of cudaError. More...
 
class  CudaNotPermittedException
 Exception corresponding to cudaErrorNotPermitted of cudaError. More...
 
class  CudaNotReadyException
 Exception corresponding to cudaErrorNotReady of cudaError. More...
 
class  CudaNotSupportedException
 Exception corresponding to cudaErrorNotSupported of cudaError. More...
 
class  CudaNotYetImplementedException
 Exception corresponding to cudaErrorNotYetImplemented of cudaError. More...
 
class  CudaNvlinkUncorrectableException
 Exception corresponding to cudaErrorNvlinkUncorrectable of cudaError. More...
 
class  CudaOperatingSystemException
 Exception corresponding to cudaErrorOperatingSystem of cudaError. More...
 
class  CudaPeerAccessAlreadyEnabledException
 Exception corresponding to cudaErrorPeerAccessAlreadyEnabled of cudaError. More...
 
class  CudaPeerAccessNotEnabledException
 Exception corresponding to cudaErrorPeerAccessNotEnabled of cudaError. More...
 
class  CudaPeerAccessUnsupportedException
 Exception corresponding to cudaErrorPeerAccessUnsupported of cudaError. More...
 
class  CudaPriorLaunchFailureException
 Exception corresponding to cudaErrorPriorLaunchFailure of cudaError. More...
 
class  CudaProfilerAlreadyStartedException
 Exception corresponding to cudaErrorProfilerAlreadyStarted of cudaError. More...
 
class  CudaProfilerAlreadyStoppedException
 Exception corresponding to cudaErrorProfilerAlreadyStopped of cudaError. More...
 
class  CudaProfilerDisabledException
 Exception corresponding to cudaErrorProfilerDisabled of cudaError. More...
 
class  CudaProfilerNotInitializedException
 Exception corresponding to cudaErrorProfilerNotInitialized of cudaError. More...
 
class  CudaSetOnActiveProcessException
 Exception corresponding to cudaErrorSetOnActiveProcess of cudaError. More...
 
class  CudaSharedObjectInitFailedException
 Exception corresponding to cudaErrorSharedObjectInitFailed of cudaError. More...
 
class  CudaSharedObjectSymbolNotFoundException
 Exception corresponding to cudaErrorSharedObjectSymbolNotFound of cudaError. More...
 
class  CudaStartupFailureException
 Exception corresponding to cudaErrorStartupFailure of cudaError. More...
 
class  CudaStreamCaptureImplicitException
 Exception corresponding to cudaErrorStreamCaptureImplicit of cudaError. More...
 
class  CudaStreamCaptureInvalidatedException
 Exception corresponding to cudaErrorStreamCaptureInvalidated of cudaError. More...
 
class  CudaStreamCaptureIsolationException
 Exception corresponding to cudaErrorStreamCaptureIsolation of cudaError. More...
 
class  CudaStreamCaptureMergeException
 Exception corresponding to cudaErrorStreamCaptureMerge of cudaError. More...
 
class  CudaStreamCaptureUnjoinedException
 Exception corresponding to cudaErrorStreamCaptureUnjoined of cudaError. More...
 
class  CudaStreamCaptureUnmatchedException
 Exception corresponding to cudaErrorStreamCaptureUnmatched of cudaError. More...
 
class  CudaStreamCaptureUnsupportedException
 Exception corresponding to cudaErrorStreamCaptureUnsupported of cudaError. More...
 
class  CudaStreamCaptureWrongThreadException
 Exception corresponding to cudaErrorStreamCaptureWrongThread of cudaError. More...
 
class  CudaSymbolNotFoundException
 Exception corresponding to cudaErrorSymbolNotFound of cudaError. More...
 
class  CudaSyncDepthExceededException
 Exception corresponding to cudaErrorSyncDepthExceeded of cudaError. More...
 
class  CudaSynchronizationErrorException
 Exception corresponding to cudaErrorSynchronizationError of cudaError. More...
 
class  CudaSystemDriverMismatchException
 Exception corresponding to cudaErrorSystemDriverMismatch of cudaError. More...
 
class  CudaSystemNotReadyException
 Exception corresponding to cudaErrorSystemNotReady of cudaError. More...
 
class  CudaTextureFetchFailedException
 Exception corresponding to cudaErrorTextureFetchFailed of cudaError. More...
 
class  CudaTextureNotBoundException
 Exception corresponding to cudaErrorTextureNotBound of cudaError. More...
 
class  CudaTimeoutException
 Exception corresponding to cudaErrorTimeout of cudaError. More...
 
class  CudaTooManyPeersException
 Exception corresponding to cudaErrorTooManyPeers of cudaError. More...
 
class  CudaUnknownException
 Exception corresponding to cudaErrorUnknown of cudaError. More...
 
class  CudaUnmapBufferObjectFailedException
 Exception corresponding to cudaErrorUnmapBufferObjectFailed of cudaError. More...
 
class  CudaUnsupportedLimitException
 Exception corresponding to cudaErrorUnsupportedLimit of cudaError. More...
 
class  Device
 Represents a GPU. More...
 
class  Event
 Represents an event in a compute stream. More...
 
struct  get_addrspace
 Get the address space of a type. More...
 
class  Host
 Represents the host. More...
 
class  RAIIObject
 A generic mechanism for giving RAII semantics to C-style APIs. More...
 
struct  remove_addrspace
 Remove address space qualifiers from a type. More...
 
struct  remove_cva
 Remove const, volatile, and address space from a type. More...
 
struct  remove_cvaref
 Remove reference const, volatile, and address space from a type. More...
 
struct  remove_cvref
 This is part of C++20: http://en.cppreference.com/w/cpp/types/remove_cvref. More...
 
struct  Shfl
 Shuffler corresponding to shfl(). The offset parameter behaves like shfl()'s srcLane argument. More...
 
struct  ShflDown
 Shuffler corresponding to shfl_down(). The offset parameter behaves like shfl_down()'s delta argument. More...
 
struct  ShflUp
 Shuffler corresponding to shfl_up(). The offset parameter behaves like shfl_up()'s delta argument. More...
 
struct  ShflXor
 Shuffler corresponding to shfl_xor(). The offset parameter behaves like shfl_xor()'s laneMask argument. More...
 
class  Stream
 Represents a CUDA stream. More...
 

Typedefs

template<typename T , AddressSpace AS>
using add_addrspace_t = typename add_addrspace< T, AS >::type
 Add an address space qualifier to a type. More...
 
template<typename T >
using remove_addrspace_t = typename remove_addrspace< T >::type
 Remove address space qualifiers from a type. More...
 
template<typename T >
using remove_cva_t = typename remove_cva< T >::type
 Remove const, volatile, and address space from a type. More...
 
template<typename T >
using remove_cvaref_t = typename remove_cvaref< T >::type
 Remove reference const, volatile, and address space from a type. More...
 
template<cudaError_t Code>
using CudaExceptionFor = typename CudaExceptionForImpl< Code >::type
 Type alias that gets you the exception type corresponding to a specific CUDA error code. More...
 
template<typename T >
using remove_cvref_t = typename remove_cvref< T >::type
 
template<typename T , typename Q >
using copy_cvref_t = typename copy_cvref< T, Q >::type
 
template<typename T >
using allocatable_type_t = typename allocatable_type< T >::type
 
template<typename T , typename Q = T>
using UniquePtr = std::unique_ptr< T, std::function< void(Q *)> >
 Handy type alias for std::unique_ptrs that have simple custom deleters. More...
 

Enumerations

enum  AddressSpace {
  FLAT = __scale_address_space_flat , GENERIC = __scale_address_space_generic , DEVICE = __scale_address_space_device , SHARED = __scale_address_space_shared ,
  CONSTANT = __scale_address_space_constant , LOCAL = __scale_address_space_local
}
 Enum providing the address-space numbers for GPUs. More...
 
enum class  DeviceMemoryType { DeviceMemoryType::NORMAL , DeviceMemoryType::MANAGED }
 Type of GPU memory allocation. More...
 
enum  DeviceAttr {
  MaxThreadsPerBlock = 1 , MaxBlockDimX = 2 , MaxBlockDimY = 3 , MaxBlockDimZ = 4 ,
  MaxGridDimX = 5 , MaxGridDimY = 6 , MaxGridDimZ = 7 , MaxSharedMemoryPerBlock = 8 ,
  TotalConstantMemory = 9 , WarpSize = 10 , MaxPitch = 11 , MaxRegistersPerBlock = 12 ,
  ClockRate = 13 , TextureAlignment = 14 , GpuOverlap = 15 , MultiProcessorCount = 16 ,
  KernelExecTimeout = 17 , Integrated = 18 , CanMapHostMemory = 19 , ComputeMode = 20 ,
  MaxTexture1DWidth = 21 , MaxTexture2DWidth = 22 , MaxTexture2DHeight = 23 , MaxTexture3DWidth = 24 ,
  MaxTexture3DHeight = 25 , MaxTexture3DDepth = 26 , MaxTexture2DLayeredWidth = 27 , MaxTexture2DLayeredHeight = 28 ,
  MaxTexture2DLayeredLayers = 29 , SurfaceAlignment = 30 , ConcurrentKernels = 31 , EccEnabled = 32 ,
  PciBusId = 33 , PciDeviceId = 34 , TccDriver = 35 , MemoryClockRate = 36 ,
  GlobalMemoryBusWidth = 37 , L2CacheSize = 38 , MaxThreadsPerMultiProcessor = 39 , AsyncEngineCount = 40 ,
  UnifiedAddressing = 41 , MaxTexture1DLayeredWidth = 42 , MaxTexture1DLayeredLayers = 43 , MaxTexture2DGatherWidth = 45 ,
  MaxTexture2DGatherHeight = 46 , MaxTexture3DWidthAlt = 47 , MaxTexture3DHeightAlt = 48 , MaxTexture3DDepthAlt = 49 ,
  PciDomainId = 50 , TexturePitchAlignment = 51 , MaxTextureCubemapWidth = 52 , MaxTextureCubemapLayeredWidth = 53 ,
  MaxTextureCubemapLayeredLayers = 54 , MaxSurface1DWidth = 55 , MaxSurface2DWidth = 56 , MaxSurface2DHeight = 57 ,
  MaxSurface3DWidth = 58 , MaxSurface3DHeight = 59 , MaxSurface3DDepth = 60 , MaxSurface1DLayeredWidth = 61 ,
  MaxSurface1DLayeredLayers = 62 , MaxSurface2DLayeredWidth = 63 , MaxSurface2DLayeredHeight = 64 , MaxSurface2DLayeredLayers = 65 ,
  MaxSurfaceCubemapWidth = 66 , MaxSurfaceCubemapLayeredWidth = 67 , MaxSurfaceCubemapLayeredLayers = 68 , MaxTexture1DLinearWidth = 69 ,
  MaxTexture2DLinearWidth = 70 , MaxTexture2DLinearHeight = 71 , MaxTexture2DLinearPitch = 72 , MaxTexture2DMipmappedWidth = 73 ,
  MaxTexture2DMipmappedHeight = 74 , ComputeCapabilityMajor = 75 , ComputeCapabilityMinor = 76 , MaxTexture1DMipmappedWidth = 77 ,
  StreamPrioritiesSupported = 78 , GlobalL1CacheSupported = 79 , LocalL1CacheSupported = 80 , MaxSharedMemoryPerMultiprocessor = 81 ,
  MaxRegistersPerMultiprocessor = 82 , ManagedMemory = 83 , IsMultiGpuBoard = 84 , MultiGpuBoardGroupID = 85 ,
  HostNativeAtomicSupported = 86 , SingleToDoublePrecisionPerfRatio = 87 , PageableMemoryAccess = 88 , ConcurrentManagedAccess = 89 ,
  ComputePreemptionSupported = 90 , CanUseHostPointerForRegisteredMem = 91 , Reserved92 = 92 , Reserved93 = 93 ,
  Reserved94 = 94 , CooperativeLaunch = 95 , CooperativeMultiDeviceLaunch = 96 , MaxSharedMemoryPerBlockOptin = 97 ,
  CanFlushRemoteWrites = 98 , HostRegisterSupported = 99 , PageableMemoryAccessUsesHostPageTables = 100 , DirectManagedMemAccessFromHost = 101 ,
  MaxBlocksPerMultiprocessor = 106 , MaxPersistingL2CacheSize = 108 , MaxAccessPolicyWindowSize = 109 , ReservedSharedMemoryPerBlock = 111 ,
  SparseCudaArraySupported = 112 , HostRegisterReadOnlySupported = 113 , TimelineSemaphoreInteropSupported = 114 , MaxTimelineSemaphoreInteropSupported = 114 ,
  MemoryPoolsSupported = 115 , GPUDirectRDMASupported = 116 , GPUDirectRDMAFlushWritesOptions = 117 , GPUDirectRDMAWritesOrdering = 118 ,
  MemoryPoolSupportedHandleTypes = 119 , Max , TotalGlobalMem = 1024
}
 This enum does the same job as cudaDeviceAttr, but adds fields that correspond to all fixed fields in cudaDeviceProp which do not have a field in cudaDeviceAttr. More...
 
enum class  HostMemoryType { HostMemoryType::NORMAL , HostMemoryType::STAGING , HostMemoryType::PINNED , HostMemoryType::MAPPED }
 Type of host memory allocation. More...
 

Functions

__device__ int atomicAdd (__device int *addr, int val)
 Atomically add val to the value stored at AS memory location addr, returning the original value. More...
 
__device__ unsigned int atomicAdd (__device unsigned int *addr, unsigned int val)
 Atomically add val to the value stored at AS memory location addr, returning the original value. More...
 
__device__ unsigned long long atomicAdd (__device unsigned long long *addr, unsigned long long val)
 Atomically add val to the value stored at AS memory location addr, returning the original value. More...
 
__device__ float atomicAdd (__device float *addr, float val)
 Atomically add val to the value stored at AS memory location addr, returning the original value. More...
 
__device__ double atomicAdd (__device double *addr, double val)
 Atomically add val to the value stored at AS memory location addr, returning the original value. More...
 
__device__ int atomicSub (__device int *addr, int val)
 Atomically subtract val from the value stored at AS memory location addr, returning the original value. More...
 
__device__ unsigned int atomicSub (__device unsigned int *addr, unsigned int val)
 Atomically subtract val from the value stored at AS memory location addr, returning the original value. More...
 
__device__ unsigned long long atomicSub (__device unsigned long long *addr, unsigned long long val)
 Atomically subtract val from the value stored at AS memory location addr, returning the original value. More...
 
__device__ float atomicSub (__device float *addr, float val)
 Atomically subtract val from the value stored at AS memory location addr, returning the original value. More...
 
__device__ double atomicSub (__device double *addr, double val)
 Atomically subtract val from the value stored at AS memory location addr, returning the original value. More...
 
__device__ int atomicExch (__device int *addr, int val)
 Atomically write val to addr and return the value that was stored there before calling this function. More...
 
__device__ unsigned int atomicExch (__device unsigned int *addr, unsigned int val)
 Atomically write val to addr and return the value that was stored there before calling this function. More...
 
__device__ unsigned long long atomicExch (__device unsigned long long *addr, unsigned long long val)
 Atomically write val to addr and return the value that was stored there before calling this function. More...
 
__device__ float atomicExch (__device float *addr, float val)
 Atomically write val to addr and return the value that was stored there before calling this function. More...
 
__device__ int atomicMin (__device int *addr, int val)
 Atomically write the min of *addr and val to addr, returning the original value of *addr. More...
 
__device__ int atomicMax (__device int *addr, int val)
 Atomically write the max of *addr and val to addr, returning the original value of *addr. More...
 
__device__ unsigned int atomicMin (__device unsigned int *addr, unsigned int val)
 Atomically write the min of *addr and val to addr, returning the original value of *addr. More...
 
__device__ unsigned int atomicMax (__device unsigned int *addr, unsigned int val)
 Atomically write the max of *addr and val to addr, returning the original value of *addr. More...
 
__device__ unsigned long long atomicMin (__device unsigned long long *addr, unsigned long long val)
 Atomically write the min of *addr and val to addr, returning the original value of *addr. More...
 
__device__ unsigned long long atomicMax (__device unsigned long long *addr, unsigned long long val)
 Atomically write the max of *addr and val to addr, returning the original value of *addr. More...
 
__device__ int atomicCAS (__device int *addr, int cmp, int val)
 
__device__ unsigned int atomicCAS (__device unsigned int *addr, unsigned int cmp, unsigned int val)
 
__device__ unsigned long long atomicCAS (__device unsigned long long *addr, unsigned long long cmp, unsigned long long val)
 
__device__ int atomicAnd (__device int *addr, int val)
 Atomically compute *addr = *addr & val and return the original value of *addr. More...
 
__device__ int atomicOr (__device int *addr, int val)
 Atomically compute *addr = *addr | val and return the original value of *addr. More...
 
__device__ int atomicXor (__device int *addr, int val)
 Atomically compute *addr = *addr ^ val and return the original value of *addr. More...
 
__device__ unsigned int atomicAnd (__device unsigned int *addr, unsigned int val)
 Atomically compute *addr = *addr & val and return the original value of *addr. More...
 
__device__ unsigned int atomicOr (__device unsigned int *addr, unsigned int val)
 Atomically compute *addr = *addr | val and return the original value of *addr. More...
 
__device__ unsigned int atomicXor (__device unsigned int *addr, unsigned int val)
 Atomically compute *addr = *addr ^ val and return the original value of *addr. More...
 
__device__ unsigned long long atomicAnd (__device unsigned long long *addr, unsigned long long val)
 Atomically compute *addr = *addr & val and return the original value of *addr. More...
 
__device__ unsigned long long atomicOr (__device unsigned long long *addr, unsigned long long val)
 Atomically compute *addr = *addr | val and return the original value of *addr. More...
 
__device__ unsigned long long atomicXor (__device unsigned long long *addr, unsigned long long val)
 Atomically compute *addr = *addr ^ val and return the original value of *addr. More...
 
__device__ unsigned int atomicInc (__device unsigned int *addr, unsigned int val)
 
__device__ unsigned int atomicDec (__device unsigned int *addr, unsigned int val)
 
template<typename T >
__device__ T tex1Dfetch (cudaTextureObject_t tex, int x)
 
template<typename T >
__device__ T tex2Dfetch (cudaTextureObject_t tex, int x, int y)
 Read from a 2D texture using integer coordinates. (API extension) More...
 
template<typename T >
__device__ T tex3Dfetch (cudaTextureObject_t tex, int x, int y, int z)
 Read from a 3D texture using integer coordinates. (API extension) More...
 
template<typename T >
__device__ T tex1DOffsetfetch (cudaTextureObject_t tex, int x, int xO)
 Read from a 1D texture at an integer offset with offset addressing (API extension) More...
 
template<typename T >
__device__ T tex2DOffsetfetch (cudaTextureObject_t tex, int x, int y, int xO, int yO)
 Read from a 2D texture using integer coordinates with offset addressing. (API extension) More...
 
template<typename T >
__device__ T tex3DOffsetfetch (cudaTextureObject_t tex, int x, int y, int z, int xO, int yO, int zO)
 Read from a 3D texture using integer coordinates. with offset addressing (API extension) More...
 
template<typename T >
__device__ T tex1D (cudaTextureObject_t tex, float x)
 Read from a 1D texture at a floating-point offset. More...
 
template<typename T >
__device__ T tex2D (cudaTextureObject_t tex, float x, float y)
 Read from a 2D texture using floating-point coordinates. More...
 
template<typename T >
__device__ T tex3D (cudaTextureObject_t tex, float x, float y, float z)
 Read from a 3D texture using floating-point coordinates. More...
 
__device__ int getTexWidth (cudaTextureObject_t tex)
 Query the width of a texture object. More...
 
__device__ int getTexHeight (cudaTextureObject_t tex)
 Query the height of a texture object. More...
 
__device__ int getTexDepth (cudaTextureObject_t tex)
 Query the depth of a texture object. More...
 
template<>
__device__ float4 tex1Dfetch (cudaTextureObject_t tex, int x)
 
template<>
__device__ float4 tex1DOffsetfetch (cudaTextureObject_t tex, int x, int xO)
 Read from a 1D texture at an integer offset with offset addressing (API extension) More...
 
template<>
__device__ float4 tex2Dfetch (cudaTextureObject_t tex, int x, int y)
 Read from a 2D texture using integer coordinates. (API extension) More...
 
template<>
__device__ float4 tex2DOffsetfetch (cudaTextureObject_t tex, int x, int y, int xO, int yO)
 Read from a 2D texture using integer coordinates with offset addressing. (API extension) More...
 
template<>
__device__ float4 tex3Dfetch (cudaTextureObject_t tex, int x, int y, int z)
 Read from a 3D texture using integer coordinates. (API extension) More...
 
template<>
__device__ float4 tex3DOffsetfetch (cudaTextureObject_t tex, int x, int y, int z, int xO, int yO, int zO)
 Read from a 3D texture using integer coordinates. with offset addressing (API extension) More...
 
template<>
__device__ float4 tex1D (cudaTextureObject_t tex, float x)
 Read from a 1D texture at a floating-point offset. More...
 
template<>
__device__ float4 tex2D (cudaTextureObject_t tex, float x, float y)
 Read from a 2D texture using floating-point coordinates. More...
 
template<>
__device__ float4 tex3D (cudaTextureObject_t tex, float x, float y, float z)
 Read from a 3D texture using floating-point coordinates. More...
 
__device__ lanemask_t lanemaskLt (int laneID)
 Get a bitmask with a 1 in every position lower than this thread's lane ID. More...
 
__device__ lanemask_t lanemaskLe (int laneID)
 Get a bitmask with a 1 in every position lower than or equal to this thread's lane ID. More...
 
__device__ lanemask_t lanemaskEq (int laneID)
 Get a bitmask with a 1 only in the position equal to this thread's lane ID. More...
 
__device__ lanemask_t lanemaskGe (int laneID)
 Get a bitmask with a 1 in every position greater than this thread's lane ID. More...
 
__device__ lanemask_t lanemaskGt (int laneID)
 Get a bitmask with a 1 in every position greater than or equal to this thread's lane ID. More...
 
template<typename Shuffler = Shfl, typename T >
__device__ auto shuffle (T value, int offset, int logicalWarpSize=WARP_SIZE)
 Generic shuffle. More...
 
template<typename Shuffler = Shfl, typename T >
__device__ std::pair< bool, T > shufflePredicated (T value, int offset, int logicalWarpSize=WARP_SIZE, int laneID=0)
 Like shuffle(), but also yields a boolean indicating if the value that was read is valid. More...
 
__device__ void syncthreads (int barrierID, int numWarps)
 A more powerful __syncthreads() More...
 
__device__ void syncthreads (int barrierID)
 Like the other sp::syncthreads(), but implicitly synchronises all non-exited warps in the block. More...
 
__device__ void syncthreads_arrive (int barrierID, int numWarps)
 Functions exactly like sp::syncthreads(), but this warp does not block. More...
 
__device__ int syncthreads_count (int barrierID, int numWarps, bool predicate)
 Like sp::syncthreads(), but also returns a count of how many threads passed true for predicate. More...
 
__device__ int syncthreads_count (int barrierID, bool predicate)
 sp::syncthreads_count(), implicitly applied to all non-exited warps. More...
 
__device__ bool syncthreads_and (int barrierID, int numWarps, bool predicate)
 Like sp::syncthreads(), but also returns true iff all participating threads passed true for predicate. More...
 
__device__ bool syncthreads_and (int barrierID, bool predicate)
 sp::syncthreads_and(), implicitly applied to all non-exited warps. More...
 
__device__ bool syncthreads_or (int barrierID, int numWarps, bool predicate)
 Like sp::syncthreads(), but also returns true iff any participating threads passed true for predicate. More...
 
__device__ bool syncthreads_or (int barrierID, bool predicate)
 sp::syncthreads_or(), implicitly applied to all non-exited warps. More...
 
void throwCudaException (cudaError_t c, std::string desc="")
 Throw a cudaError_t as an exception, unless it's cudaSuccess. More...
 
template<typename ExceptionType , typename T >
void throwIfNull (T *ptr, std::string message)
 Nifty utility function for throwing exceptions based on null pointer checks. More...
 
template<typename... Args>
constexpr void reallyDoAssert (const char *fileName, int line, const char *functionName, bool passed, const char *checkString, const char *message, Args &&... args)
 
template<bool Enabled, typename... Args>
constexpr void doAssert (Args &&... args)
 
template<typename T >
void nopDeleter (T *)
 A deleter that doesn't actually delete anything. More...
 
template<typename T >
void pinnedMemoryDeleter (T *p)
 Deleter for pinned host memory. More...
 
template<typename T >
void deviceMemoryDeleter (__device T *p)
 Deleter for device memory. More...
 
bool isDevicePointer (const void *ptr)
 Runtime check if a flat-address-space pointer is a pointer to any GPU memory. More...
 
void throwIfNotDevicePointer (const void *ptr, std::string name)
 
void throwIfDevicePointer (const void *ptr, std::string name)
 
template<typename T , typename Q >
bool operator== (const Managed_Allocator< T > &, const Managed_Allocator< Q > &)
 
template<typename T , typename Q >
bool operator!= (const Managed_Allocator< T > &lhs, const Managed_Allocator< Q > &rhs)
 
template<typename T , typename U , typename Deleter >
std::unique_ptr< T, Deleter > static_pointer_cast (std::unique_ptr< U, Deleter > &r) noexcept
 Perform a static_cast operation on an std::unique_ptr. More...
 
template<typename T , typename U , typename Deleter >
std::unique_ptr< T, Deleter > dynamic_pointer_cast (std::unique_ptr< U, Deleter > &r) noexcept
 Like static_pointer_cast, but performs a dynamic_cast() More...
 
template<typename T , typename U , typename Deleter >
std::unique_ptr< T, Deleter > const_pointer_cast (std::unique_ptr< U, Deleter > &r) noexcept
 Like static_pointer_cast, but performs a const_cast() More...
 
template<typename T , typename U , typename Deleter >
std::unique_ptr< T, Deleter > reinterpret_pointer_cast (std::unique_ptr< U, Deleter > &r) noexcept
 Like static_pointer_cast, but performs a reinterpret_cast() More...
 

Variables

template<typename T >
constexpr sp::AddressSpace get_addrspace_v = get_addrspace<T>::value
 Get the address space of a type. More...
 
constexpr bool OnDevice = false
 
constexpr int CudaVersion = 99999
 
constexpr bool GPUIntegrated = false
 
constexpr bool EmulateLanemaskIntrinsics = IsAmd
 Feature selectors. ///. More...
 
constexpr bool HasExtendedSyncthreads = !IsAmd
 If true, sub-block thread synchronisation is possible using sp::syncthreads(). More...
 
constexpr bool IsWindows = false
 
constexpr bool AssertionsEnabled = false
 
constexpr const char * UNKNOWN_FN = "(UNKOWN FUNCTION)"
 
constexpr const char * UNKNOWN_FILE = "(UNKOWN FILE)"
 

Detailed Description

Namespace with Spectral Compute Ltd things.

Enumeration Type Documentation

◆ DeviceAttr

This enum does the same job as cudaDeviceAttr, but adds fields that correspond to all fixed fields in cudaDeviceProp which do not have a field in cudaDeviceAttr.

This way, the enum represents a complete list of properties that are compile-time-constants. Notably, totalGlobalMem is missing from nvidia's enum.

Enumerator
MaxThreadsPerBlock 

Maximum number of threads per block.

MaxBlockDimX 

Maximum block dimension X.

MaxBlockDimY 

Maximum block dimension Y.

MaxBlockDimZ 

Maximum block dimension Z.

MaxGridDimX 

Maximum grid dimension X.

MaxGridDimY 

Maximum grid dimension Y.

MaxGridDimZ 

Maximum grid dimension Z.

MaxSharedMemoryPerBlock 

Maximum shared memory available per block in bytes.

TotalConstantMemory 

Memory available on device for constant variables in a CUDA C kernel in bytes.

WarpSize 

Warp size in threads.

MaxPitch 

Maximum pitch in bytes allowed by memory copies.

MaxRegistersPerBlock 

Maximum number of 32-bit registers available per block.

ClockRate 

Peak clock frequency in kilohertz.

TextureAlignment 

Alignment requirement for textures.

GpuOverlap 

Device can possibly copy memory and execute a kernel concurrently.

MultiProcessorCount 

Number of multiprocessors on device.

KernelExecTimeout 

Specifies whether there is a run time limit on kernels.

Integrated 

Device is integrated with host memory.

CanMapHostMemory 

Device can map host memory into CUDA address space.

ComputeMode 

Compute mode (See cudaComputeMode for details)

MaxTexture1DWidth 

Maximum 1D texture width.

MaxTexture2DWidth 

Maximum 2D texture width.

MaxTexture2DHeight 

Maximum 2D texture height.

MaxTexture3DWidth 

Maximum 3D texture width.

MaxTexture3DHeight 

Maximum 3D texture height.

MaxTexture3DDepth 

Maximum 3D texture depth.

MaxTexture2DLayeredWidth 

Maximum 2D layered texture width.

MaxTexture2DLayeredHeight 

Maximum 2D layered texture height.

MaxTexture2DLayeredLayers 

Maximum layers in a 2D layered texture.

SurfaceAlignment 

Alignment requirement for surfaces.

ConcurrentKernels 

Device can possibly execute multiple kernels concurrently.

EccEnabled 

Device has ECC support enabled.

PciBusId 

PCI bus ID of the device.

PciDeviceId 

PCI device ID of the device.

TccDriver 

Device is using TCC driver model.

MemoryClockRate 

Peak memory clock frequency in kilohertz.

GlobalMemoryBusWidth 

Global memory bus width in bits.

L2CacheSize 

Size of L2 cache in bytes.

MaxThreadsPerMultiProcessor 

Maximum resident threads per multiprocessor.

AsyncEngineCount 

Number of asynchronous engines.

UnifiedAddressing 

Device shares a unified address space with the host.

MaxTexture1DLayeredWidth 

Maximum 1D layered texture width.

MaxTexture1DLayeredLayers 

Maximum layers in a 1D layered texture.

MaxTexture2DGatherWidth 

Maximum 2D texture width if cudaArrayTextureGather is set.

MaxTexture2DGatherHeight 

Maximum 2D texture height if cudaArrayTextureGather is set.

MaxTexture3DWidthAlt 

Alternate maximum 3D texture width.

MaxTexture3DHeightAlt 

Alternate maximum 3D texture height.

MaxTexture3DDepthAlt 

Alternate maximum 3D texture depth.

PciDomainId 

PCI domain ID of the device.

TexturePitchAlignment 

Pitch alignment requirement for textures.

MaxTextureCubemapWidth 

Maximum cubemap texture width/height.

MaxTextureCubemapLayeredWidth 

Maximum cubemap layered texture width/height.

MaxTextureCubemapLayeredLayers 

Maximum layers in a cubemap layered texture.

MaxSurface1DWidth 

Maximum 1D surface width.

MaxSurface2DWidth 

Maximum 2D surface width.

MaxSurface2DHeight 

Maximum 2D surface height.

MaxSurface3DWidth 

Maximum 3D surface width.

MaxSurface3DHeight 

Maximum 3D surface height.

MaxSurface3DDepth 

Maximum 3D surface depth.

MaxSurface1DLayeredWidth 

Maximum 1D layered surface width.

MaxSurface1DLayeredLayers 

Maximum layers in a 1D layered surface.

MaxSurface2DLayeredWidth 

Maximum 2D layered surface width.

MaxSurface2DLayeredHeight 

Maximum 2D layered surface height.

MaxSurface2DLayeredLayers 

Maximum layers in a 2D layered surface.

MaxSurfaceCubemapWidth 

Maximum cubemap surface width.

MaxSurfaceCubemapLayeredWidth 

Maximum cubemap layered surface width.

MaxSurfaceCubemapLayeredLayers 

Maximum layers in a cubemap layered surface.

MaxTexture1DLinearWidth 

Maximum 1D linear texture width.

MaxTexture2DLinearWidth 

Maximum 2D linear texture width.

MaxTexture2DLinearHeight 

Maximum 2D linear texture height.

MaxTexture2DLinearPitch 

Maximum 2D linear texture pitch in bytes.

MaxTexture2DMipmappedWidth 

Maximum mipmapped 2D texture width.

MaxTexture2DMipmappedHeight 

Maximum mipmapped 2D texture height.

ComputeCapabilityMajor 

Major compute capability version number.

ComputeCapabilityMinor 

Minor compute capability version number.

MaxTexture1DMipmappedWidth 

Maximum mipmapped 1D texture width.

StreamPrioritiesSupported 

Device supports stream priorities.

GlobalL1CacheSupported 

Device supports caching globals in L1.

LocalL1CacheSupported 

Device supports caching locals in L1.

MaxSharedMemoryPerMultiprocessor 

Maximum shared memory available per multiprocessor in bytes.

MaxRegistersPerMultiprocessor 

Maximum number of 32-bit registers available per multiprocessor.

ManagedMemory 

Device can allocate managed memory on this system.

IsMultiGpuBoard 

Device is on a multi-GPU board.

MultiGpuBoardGroupID 

Unique identifier for a group of devices on the same multi-GPU board.

HostNativeAtomicSupported 

Link between the device and the host supports native atomic operations.

SingleToDoublePrecisionPerfRatio 

Ratio of single precision performance (in floating-point operations per second) to double precision performance.

PageableMemoryAccess 

Device supports coherently accessing pageable memory without calling cudaHostRegister on it.

ConcurrentManagedAccess 

Device can coherently access managed memory concurrently with the CPU.

ComputePreemptionSupported 

Device supports Compute Preemption.

CanUseHostPointerForRegisteredMem 

Device can access host registered memory at the same virtual address as the CPU.

CooperativeLaunch 

Device supports launching cooperative kernels via cudaLaunchCooperativeKernel.

CooperativeMultiDeviceLaunch 

Deprecated, cudaLaunchCooperativeKernelMultiDevice is deprecated.

MaxSharedMemoryPerBlockOptin 

The maximum optin shared memory per block.

This value may vary by chip. See cudaFuncSetAttribute

CanFlushRemoteWrites 

Device supports flushing of outstanding remote writes.

HostRegisterSupported 

Device supports host memory registration via cudaHostRegister.

PageableMemoryAccessUsesHostPageTables 

Device accesses pageable memory via the host's page tables.

DirectManagedMemAccessFromHost 

Host can directly access managed memory on the device without migration.

MaxBlocksPerMultiprocessor 

Maximum number of blocks per multiprocessor.

MaxPersistingL2CacheSize 

Maximum L2 persisting lines capacity setting in bytes.

MaxAccessPolicyWindowSize 

Maximum value of cudaAccessPolicyWindow::num_bytes.

ReservedSharedMemoryPerBlock 

Shared memory reserved by CUDA driver per block in bytes.

SparseCudaArraySupported 

Device supports sparse CUDA arrays and sparse CUDA mipmapped arrays.

HostRegisterReadOnlySupported 

Device supports using the cudaHostRegister flag cudaHostRegisterReadOnly to register memory that must be mapped as read-only to the GPU.

TimelineSemaphoreInteropSupported 

External timeline semaphore interop is supported on the device.

MaxTimelineSemaphoreInteropSupported 

Deprecated, External timeline semaphore interop is supported on the device.

MemoryPoolsSupported 

Device supports using the cudaMallocAsync and cudaMemPool_t family of APIs.

GPUDirectRDMASupported 

Device supports GPUDirect RDMA APIs, like nvidia_p2p_get_pages (see https://docs.nvidia.com/cuda/gpudirect-rdma for more information)

GPUDirectRDMAFlushWritesOptions 

The returned attribute shall be interpreted as a bitmask, where the individual bits are listed in the cudaFlushGPUDirectRDMAWritesOptions enum.

GPUDirectRDMAWritesOrdering 

GPUDirect RDMA writes to the device do not need to be flushed for consumers within the scope indicated by the returned attribute.

See cudaGPUDirectRDMAWritesOrdering for the numerical values returned here.

MemoryPoolSupportedHandleTypes 

Handle types supported with mempool based IPC.

TotalGlobalMem 

Global memory size in bytes.

Function Documentation

◆ isDevicePointer()

bool sp::isDevicePointer ( const void *  ptr)

Runtime check if a flat-address-space pointer is a pointer to any GPU memory.

This function is quite slow, and you should be able to avoid using it by making use of address space annotation on pointer types. It is provided mostly for ease of migration to statically-typed address spacing, and for interoperability with other CUDA libraries (which may use generic pointers).

◆ throwIfNull()

template<typename ExceptionType , typename T >
void sp::throwIfNull ( T *  ptr,
std::string  message 
)

Nifty utility function for throwing exceptions based on null pointer checks.

Variable Documentation

◆ EmulateLanemaskIntrinsics

constexpr bool sp::EmulateLanemaskIntrinsics = IsAmd
constexpr

Feature selectors. ///.

◆ HasExtendedSyncthreads

constexpr bool sp::HasExtendedSyncthreads = !IsAmd
constexpr

If true, sub-block thread synchronisation is possible using sp::syncthreads().

If false, the first argument of that function is ignored and always treated as if it were zero.