libSCALE  0.2.0
A modern C++ CUDA API
sp::Device Class Reference

Represents a GPU. More...

#include <Device.hpp>

Public Member Functions

void ensureActive () const
 Ensure this device is the active one in libcuda. More...
 
 Device (const Device &)=delete
 
Deviceoperator= (const Device &)=delete
 
int getId () const
 Get the device ID of this device. More...
 
 operator int () const
 Implicitly convert to the device ID. More...
 
void setName (const std::string &name) const
 Assign a name to the device. More...
 
const cudaDevicePropgetProperties () const
 Get the device properties. More...
 
cudaFuncCache getCacheConfig () const
 Query device cache configuration. More...
 
void setCacheConfig (cudaFuncCache conf)
 Set device cache configuration. More...
 
size_t getLimit (cudaLimit limit) const
 Query device limits. More...
 
void setLimit (cudaLimit limit, size_t value)
 Set device limits. More...
 
cudaSharedMemConfig getSharedMemConfig () const
 Get the shared memory mode for this device. More...
 
void setSharedMemConfig (cudaSharedMemConfig conf)
 Set the shared memory mode for this device. More...
 
unsigned int getFlags () const
 Query the device flags. More...
 
void setFlags (unsigned int flags)
 Set the flags for this device. More...
 
std::string getPCIBusId () const
 Get the PCI bus ID string for this device. More...
 
void reset () const
 Reset this device. More...
 
void synchronize () const
 Wait for all work on this device to finish. More...
 
Stream createStreamWithPriority (int priority, const std::string &name) const
 Make a prioritised stream on this device. More...
 
Stream createStreamWithPriority (int priority) const
 Make a prioritised stream on this device. More...
 
Stream createStream (const std::string &name) const
 Make a stream on this device. More...
 
Stream createStream () const
 Make a stream on this device. More...
 
StreamgetIncomingPrefetchStream ()
 Get the stream to be used for incoming memory prefetch operations on this device. More...
 
std::pair< int, int > getStreamPriorityRange () const
 Get the least and greatest stream priorities supported by the device. More...
 
template<typename T >
sp::UniquePtr< __device T > allocateMemory (size_t n, DeviceMemoryType memType=DeviceMemoryType::NORMAL)
 Allocate device memory. More...
 
template<typename T >
void queueMemoryPrefetch (const T *src, size_t count) const
 Enqueue a unified memory prefetch operation on incomingPrefetchStream More...
 
template<typename T >
__device T * getMappedDevicePtr (T *ptr) const
 
bool hasAddress (__device const void *ptr) const
 Return true iff the given pointer points to memory on this device. More...
 

Static Public Member Functions

template<DeviceAttr Attr, int Device = 0>
constexpr static auto getAttribute ()
 Query a property of the device. More...
 
template<cudaDeviceAttr Attr, int Device = 0>
constexpr static auto getAttribute ()
 Overload to allow use of cudaDeviceAttr. More...
 
static const DevicegetInvalid ()
 Get an invalid device object. More...
 
static Deviceget (int i)
 Get the device object representing the i'th GPU, or throw if it doesn't exist. More...
 
static DevicegetByPCIBusId (std::string_view pciBusId)
 Get the device at a given PCI bus ID, or throw if it doesn't exist. More...
 
static Devicechoose (const cudaDeviceProp &prop)
 Select the device that most closely matches the given properties. More...
 
static DevicegetActive ()
 Get the "active" device according to libcuda's global state. More...
 
static int getCount ()
 Return the number of CUDA-capable devices present. More...
 

Friends

class Stream
 

Detailed Description

Represents a GPU.

Logically, a single global object exists to represent each GPU on the system. A reference to the n'th GPU (according to libcuda's numbering scheme) can be obtained via the static getter function sp::Device::get(int).

If you don't really care which GPU you're using, you can just construct a single global instance using sp::Device::getActive() and carry on as normal. If you're working with multiple GPUs, using objects to represent each one can be extremely helpful, allowing a single host thread to coordinate work across many GPUs. This allows you to avoid large amounts of host-side overhead caused by having a lot of host threads to coordinate your GPUs.

Member Function Documentation

◆ allocateMemory()

template<typename T >
sp::UniquePtr< __device T > sp::Device::allocateMemory ( size_t  n,
DeviceMemoryType  memType = DeviceMemoryType::NORMAL 
)

Allocate device memory.

Allocate memory for n elements of type T. Returned memory is a smart pointer in the __device address space that will cause the memory to be deallocated when it goes out of scope.

As usual for C++ unique pointers, you may want to promote the returned std::unique_ptr to a std::shared_ptr, or std::move it to some final location where it can represent the lifetime of the buffer.

◆ choose()

static Device & sp::Device::choose ( const cudaDeviceProp prop)
static

Select the device that most closely matches the given properties.

See also
cudaChooseDevice()

◆ createStream() [1/2]

Stream sp::Device::createStream ( ) const

Make a stream on this device.

See also
cudaStreamCreateWithPriority

◆ createStream() [2/2]

Stream sp::Device::createStream ( const std::string name) const

Make a stream on this device.

Parameters
nameAn optional name for the stream. This name may appear in profilers, debuggers, or other tools.
See also
cudaStreamCreateWithPriority

◆ createStreamWithPriority() [1/2]

Stream sp::Device::createStreamWithPriority ( int  priority) const

Make a prioritised stream on this device.

Parameters
priorityThe scheduling priority of the stream. Lower numbers represent higher priorities. The GPU will schedule work from higher-priority streams first.
See also
cudaStreamCreateWithPriority

◆ createStreamWithPriority() [2/2]

Stream sp::Device::createStreamWithPriority ( int  priority,
const std::string name 
) const

Make a prioritised stream on this device.

Parameters
priorityThe scheduling priority of the stream. Lower numbers represent higher priorities. The GPU will schedule work from higher-priority streams first.
nameAn optional name for the stream. This name may appear in profilers, debuggers, or other tools.
See also
cudaStreamCreateWithPriority

◆ ensureActive()

void sp::Device::ensureActive ( ) const

Ensure this device is the active one in libcuda.

Use of this API is not recommended: you're better off using Device/Stream objects directly to do things on a particular device. This function is helpful when dealing with old code that is still using libcuda directly: you can ensure the right device is the "active" one when you call into the evil C code, so it does what you want.

This is mostly useful, therefore, as a migration aid.

◆ get()

static Device & sp::Device::get ( int  i)
static

Get the device object representing the i'th GPU, or throw if it doesn't exist.

◆ getActive()

static Device & sp::Device::getActive ( )
static

Get the "active" device according to libcuda's global state.

Prefer to use Device objects directly instead of relying on CUDA's global state.

See also
cudaGetDevice()

◆ getAttribute() [1/2]

template<DeviceAttr Attr, int Device = 0>
constexpr static auto sp::Device::getAttribute ( )
staticconstexpr

Query a property of the device.

If the requested property is one that is fixed by the DeviceAssumptionCache, this function is constexpr and may be used in compile-time code.

For other properties (such as cudaDevAttrEccEnabled), this still outperforms cudaDeviceGetAttribute() because the result is cached (and software typically does not cope gracefully with such properties changing during execution).

The first call to a non-assumed property will be quite expensive.

See also
cudaDeviceGetAttribute
cudaGetDeviceProperties

◆ getAttribute() [2/2]

template<cudaDeviceAttr Attr, int Device = 0>
constexpr static auto sp::Device::getAttribute ( )
staticconstexpr

Overload to allow use of cudaDeviceAttr.

◆ getByPCIBusId()

static Device & sp::Device::getByPCIBusId ( std::string_view  pciBusId)
static

Get the device at a given PCI bus ID, or throw if it doesn't exist.

See also
cudaDeviceGetByPCIBusId()

◆ getCacheConfig()

cudaFuncCache sp::Device::getCacheConfig ( ) const

Query device cache configuration.

See also
cudaDeviceGetCacheConfig

◆ getCount()

static int sp::Device::getCount ( )
static

Return the number of CUDA-capable devices present.

See also
cudaGetDeviceCount()

◆ getFlags()

unsigned int sp::Device::getFlags ( ) const

Query the device flags.

See also
cudaGetDeviceFlags

◆ getId()

int sp::Device::getId ( ) const

Get the device ID of this device.

◆ getIncomingPrefetchStream()

Stream & sp::Device::getIncomingPrefetchStream ( )

Get the stream to be used for incoming memory prefetch operations on this device.

◆ getInvalid()

static const Device & sp::Device::getInvalid ( )
static

Get an invalid device object.

◆ getLimit()

size_t sp::Device::getLimit ( cudaLimit  limit) const

Query device limits.

See also
cudaDeviceGetLimit

◆ getPCIBusId()

std::string sp::Device::getPCIBusId ( ) const

Get the PCI bus ID string for this device.

See also
cudaDeviceGetPCIBusId

◆ getProperties()

const cudaDeviceProp & sp::Device::getProperties ( ) const

Get the device properties.

Typically, this performs better than cudaGetDeviceProperties() because the result is globally cached. The first call to either this or getAttribute() is expensive, but after that it's almost free.

Note also that xcmake provides a build system feature for making the output of getAttribute a compile time constant for properties that are fixed hardware properties, allowing you to bake-in the values for a specific GPU, and use them in constexpr code.

See also
cudaDeviceGetAttribute
cudaGetDeviceProperties

◆ getSharedMemConfig()

cudaSharedMemConfig sp::Device::getSharedMemConfig ( ) const

Get the shared memory mode for this device.

See also
cudaDeviceGetSharedMemConfig

◆ getStreamPriorityRange()

std::pair< int, int > sp::Device::getStreamPriorityRange ( ) const

Get the least and greatest stream priorities supported by the device.

See also
cudaDeviceGetStreamPriorityRange()

◆ hasAddress()

bool sp::Device::hasAddress ( __device const void *  ptr) const

Return true iff the given pointer points to memory on this device.

◆ operator int()

sp::Device::operator int ( ) const

Implicitly convert to the device ID.

◆ queueMemoryPrefetch()

template<typename T >
void sp::Device::queueMemoryPrefetch ( const T *  src,
size_t  count 
) const

Enqueue a unified memory prefetch operation on incomingPrefetchStream

◆ reset()

void sp::Device::reset ( ) const

Reset this device.

See also
cudaDeviceReset

◆ setCacheConfig()

void sp::Device::setCacheConfig ( cudaFuncCache  conf)

Set device cache configuration.

See also
cudaDeviceSetCacheConfig

◆ setFlags()

void sp::Device::setFlags ( unsigned int  flags)

Set the flags for this device.

This is expensive, and probably calls reset().

libcuda is funny about setting flags, and doesn't let you do it if the device has been initialised. Device initialisation happens behind the scenes inside libcuda at a basically random time, so the only way to reliably set flags is to try it, check if it happened, and if not aggressively reset the device and try again. This function takes care of that, but since the operation is potentially very expensive you should probably only do this on startup.

See also
cudaSetDeviceFlags

◆ setLimit()

void sp::Device::setLimit ( cudaLimit  limit,
size_t  value 
)

Set device limits.

See also
cudaDeviceSetLimit

◆ setName()

void sp::Device::setName ( const std::string name) const

Assign a name to the device.

This name appears in profilers, debuggers, and other tools, where supported.

◆ setSharedMemConfig()

void sp::Device::setSharedMemConfig ( cudaSharedMemConfig  conf)

Set the shared memory mode for this device.

See also
cudaDeviceSetSharedMemConfig

◆ synchronize()

void sp::Device::synchronize ( ) const

Wait for all work on this device to finish.

See also
cudaDeviceSynchronize