Host-side CUDA API (use instead of libcuda). More...

Modules
	Exceptions
	Exception objects for errors from the GPU runtime API.

Classes
class	sp::CudaKernel
	Object representing a kernel. More...

class	sp::Device
	Represents a GPU. More...

class	sp::Event
	Represents an event in a compute stream. More...

class	sp::BlockingEvent
	An event that the host can synchronise with. More...

class	sp::Host
	Represents the host. More...

class	sp::Stream
	Represents a CUDA stream. More...

Enumerations
enum class	sp::DeviceMemoryType { sp::DeviceMemoryType::NORMAL , sp::DeviceMemoryType::MANAGED }
	Type of GPU memory allocation. More...

enum class	sp::HostMemoryType { sp::HostMemoryType::NORMAL , sp::HostMemoryType::STAGING , sp::HostMemoryType::PINNED , sp::HostMemoryType::MAPPED }
	Type of host memory allocation. More...

Detailed Description

Host-side CUDA API (use instead of libcuda).

Queue a kernel launch on this Stream

Effectively wraps cudaLaunchKernel, providing both a more convenient API and full exception handling. No more "Unspecified Launch Failure". This results in a number of ways to run a kernel:

Legacy

Use foo<<<>>>() syntax to have the compiler directly handle the launch. This will only have standard CUDA exception handling, as our compiler doesn't know how to promote <<<>>> calls to our API yet.
Call cudaLaunchKernel() directly. Standard CUDA exception handling only.

Spectral

Call this wrapper, which takes a global function pointer and provides full exception handling.
sp::Kernel::launch() launches that sp::Kernel object on the given stream. If passed an sp::Stream, it has full exception handling. If passed a cudaStream_t it does not.

There are two ways to pass the arguments for the kernel itself along to this wrapper. We recommend the first option, which takes a simple parameter pack, as seen here:

Example

    sp::Device& gpu = sp::Device::getActive();
    sp::Stream stream = gpu.createStream();
 
    // Use a string as a host-side buffer/destination
    std::string buffer = "aaaaaaaaaa";
 
    // Get the things to be passed to our kernel and put pointers to them into an array
    auto devicePointer = gpu.allocateMemory<char>(buffer.size());
 
    // Queue our stream operations
    stream.launchKernel(fill, (dim3)1, (dim3)256, 0, devicePointer.get(), 'b', buffer.size()); // "bbbbbbbbbb"
 
    // Queue a copy from the GPU to our host buffer
    stream.copyMemory(buffer.data(), devicePointer.get(), buffer.size());
    stream.synchronize();
 
    verify(buffer, "bbbbbbbbbb", 10);

However, you may also pass a (void**)arg. Usually you will construct this similarly to seen here:

Example

    // Create a pack to use the (void**) overload
    auto ptr = devicePointer.get();
    char toWrite = 'c';
    char toExpect = 'b';
    uint64_t count = buffer.size();
    void* args[4] = {&ptr, &toWrite, &toExpect, &count};
 
    stream.launchKernel(conditionalFill, (dim3)1, (dim3)256, 0, (void**)args); // "cccccccccc"
 
    stream.copyMemory(buffer.data(), devicePointer.get(), buffer.size());
    stream.synchronize();
 
    verify(buffer, "cccccccccc", 10);

Note: A void* pointer to the kernel function will implicitly convert to an sp::CudaKernel and be accepted here, for compatibility with NVIDIA® APIs.; sp::Vec<int, X> for X in 1-3 will implicitly convert to dim3 and be accepted by all methods. This is handy when you're using sp::Vec to compute sizes.

Parameters

kernelFunction	Pointer to any function which returns void. This should cover all kernel functions.
gridDim	Number of blocks
blockDim	Number of threads in each block
args	Pointer to a `std::array<void*>` containing references to the arguments needed by the kernel
dynamicSMem	Requested amount of dynamic shared memory per block in bytes

TODO: Use C++ non-type template params to shift the block/thread/smem into the template, mirroring the <<<>>> syntax in a way

Enumeration Type Documentation

◆ DeviceMemoryType

enum class sp::DeviceMemoryType

strong

Type of GPU memory allocation.

Enumerator

NORMAL

Standard allocation.

Only accessible on the device, and via memory copy operations.

MANAGED

Managed memory.

Accessible on both the host and device.

This may seem convenient, but there are serious performance implications to consider because memory accesses can require PCIe transactions - potentially many of them.

◆ HostMemoryType

enum class sp::HostMemoryType

strong

Type of host memory allocation.

Enumerator
NORMAL	Ordinary host memory. You could just use `new` instead, but if you're metaprogramming and want to select a memory type based on some constexpr function then this can be useful.
STAGING	Write-combining page-locked host memory. Such memory is optimised for use as a staging area for copies to GPU. This memory can be copied to the GPU more quickly than any other type of memory, but it should be considered write-only from the host. Host reads of this memory will be extremely slow. This is a good choice if you want a buffer that is only written to by the host and then sent to the GPU. If you want memory that is optimised for copies to GPU and may also be read by the host, use PINNED instead. A common configuration is to use STAGING memory for input to the GPU and PINNED memory for receiving output.
PINNED	Page-locked host memory. This memory can be copied to/from the GPU more efficiently than memory allocated with the usual system allocation functions. Allocating a very large amount of page locked memory can cause OS performance issues.
MAPPED	Page-locked and GPU-mapped host memory. This sort of memory can be accessed from the GPU without ever copying it there. Each access will generate its own PCIe transaction to do that. This is obviously very slow, but occasionally if you have a huge and rarely accessed buffer this is useful. Note that there is an overhead associated with using this sort of memory. If you aren't using the mappedness, use PINNED instead. Using this kind of allocation changes the behaviour of most APIs that implicitly copy buffers to merely do an address transformation to produce the device-side pointer instead.

Modules

Classes

Enumerations

Detailed Description

Legacy

Spectral

Example

Example

Enumeration Type Documentation

◆ DeviceMemoryType

◆ HostMemoryType