Represents a GPU. More...
#include <Device.hpp>
Public Member Functions | |
void | ensureActive () const |
Ensure this device is the active one in libcuda. More... | |
Device (const Device &)=delete | |
Device & | operator= (const Device &)=delete |
int | getId () const |
Get the device ID of this device. More... | |
operator int () const | |
Implicitly convert to the device ID. More... | |
void | setName (const std::string &name) const |
Assign a name to the device. More... | |
const cudaDeviceProp & | getProperties () const |
Get the device properties. More... | |
cudaFuncCache | getCacheConfig () const |
Query device cache configuration. More... | |
void | setCacheConfig (cudaFuncCache conf) |
Set device cache configuration. More... | |
size_t | getLimit (cudaLimit limit) const |
Query device limits. More... | |
void | setLimit (cudaLimit limit, size_t value) |
Set device limits. More... | |
cudaSharedMemConfig | getSharedMemConfig () const |
Get the shared memory mode for this device. More... | |
void | setSharedMemConfig (cudaSharedMemConfig conf) |
Set the shared memory mode for this device. More... | |
unsigned int | getFlags () const |
Query the device flags. More... | |
void | setFlags (unsigned int flags) |
Set the flags for this device. More... | |
std::string | getPCIBusId () const |
Get the PCI bus ID string for this device. More... | |
void | reset () const |
Reset this device. More... | |
void | synchronize () const |
Wait for all work on this device to finish. More... | |
Stream | createStreamWithPriority (int priority, const std::string &name) const |
Make a prioritised stream on this device. More... | |
Stream | createStreamWithPriority (int priority) const |
Make a prioritised stream on this device. More... | |
Stream | createStream (const std::string &name) const |
Make a stream on this device. More... | |
Stream | createStream () const |
Make a stream on this device. More... | |
Stream & | getIncomingPrefetchStream () |
Get the stream to be used for incoming memory prefetch operations on this device. More... | |
std::pair< int, int > | getStreamPriorityRange () const |
Get the least and greatest stream priorities supported by the device. More... | |
template<typename T > | |
sp::UniquePtr< __device T > | allocateMemory (size_t n, DeviceMemoryType memType=DeviceMemoryType::NORMAL) |
Allocate device memory. More... | |
template<typename T > | |
void | queueMemoryPrefetch (const T *src, size_t count) const |
Enqueue a unified memory prefetch operation on incomingPrefetchStream More... | |
template<typename T > | |
__device T * | getMappedDevicePtr (T *ptr) const |
bool | hasAddress (__device const void *ptr) const |
Return true iff the given pointer points to memory on this device. More... | |
Static Public Member Functions | |
template<DeviceAttr Attr, int Device = 0> | |
constexpr static auto | getAttribute () |
Query a property of the device. More... | |
template<cudaDeviceAttr Attr, int Device = 0> | |
constexpr static auto | getAttribute () |
Overload to allow use of cudaDeviceAttr . More... | |
static const Device & | getInvalid () |
Get an invalid device object. More... | |
static Device & | get (int i) |
Get the device object representing the i'th GPU, or throw if it doesn't exist. More... | |
static Device & | getByPCIBusId (std::string_view pciBusId) |
Get the device at a given PCI bus ID, or throw if it doesn't exist. More... | |
static Device & | choose (const cudaDeviceProp &prop) |
Select the device that most closely matches the given properties. More... | |
static Device & | getActive () |
Get the "active" device according to libcuda's global state. More... | |
static int | getCount () |
Return the number of CUDA-capable devices present. More... | |
Friends | |
class | Stream |
Represents a GPU.
Logically, a single global object exists to represent each GPU on the system. A reference to the n
'th GPU (according to libcuda's numbering scheme) can be obtained via the static getter function sp::Device::get(int)
.
If you don't really care which GPU you're using, you can just construct a single global instance using sp::Device::getActive()
and carry on as normal. If you're working with multiple GPUs, using objects to represent each one can be extremely helpful, allowing a single host thread to coordinate work across many GPUs. This allows you to avoid large amounts of host-side overhead caused by having a lot of host threads to coordinate your GPUs.
sp::UniquePtr< __device T > sp::Device::allocateMemory | ( | size_t | n, |
DeviceMemoryType | memType = DeviceMemoryType::NORMAL |
||
) |
Allocate device memory.
Allocate memory for n
elements of type T
. Returned memory is a smart pointer in the __device
address space that will cause the memory to be deallocated when it goes out of scope.
As usual for C++ unique pointers, you may want to promote the returned std::unique_ptr
to a std::shared_ptr
, or std::move
it to some final location where it can represent the lifetime of the buffer.
|
static |
Select the device that most closely matches the given properties.
Stream sp::Device::createStream | ( | ) | const |
Make a stream on this device.
Stream sp::Device::createStream | ( | const std::string & | name | ) | const |
Make a stream on this device.
name | An optional name for the stream. This name may appear in profilers, debuggers, or other tools. |
Stream sp::Device::createStreamWithPriority | ( | int | priority | ) | const |
Make a prioritised stream on this device.
priority | The scheduling priority of the stream. Lower numbers represent higher priorities. The GPU will schedule work from higher-priority streams first. |
Stream sp::Device::createStreamWithPriority | ( | int | priority, |
const std::string & | name | ||
) | const |
Make a prioritised stream on this device.
priority | The scheduling priority of the stream. Lower numbers represent higher priorities. The GPU will schedule work from higher-priority streams first. |
name | An optional name for the stream. This name may appear in profilers, debuggers, or other tools. |
void sp::Device::ensureActive | ( | ) | const |
Ensure this device is the active one in libcuda.
Use of this API is not recommended: you're better off using Device/Stream objects directly to do things on a particular device. This function is helpful when dealing with old code that is still using libcuda directly: you can ensure the right device is the "active" one when you call into the evil C code, so it does what you want.
This is mostly useful, therefore, as a migration aid.
|
static |
Get the device object representing the i'th GPU, or throw if it doesn't exist.
|
static |
Get the "active" device according to libcuda's global state.
Prefer to use Device
objects directly instead of relying on CUDA's global state.
|
staticconstexpr |
Query a property of the device.
If the requested property is one that is fixed by the DeviceAssumptionCache, this function is constexpr and may be used in compile-time code.
For other properties (such as cudaDevAttrEccEnabled
), this still outperforms cudaDeviceGetAttribute()
because the result is cached (and software typically does not cope gracefully with such properties changing during execution).
The first call to a non-assumed property will be quite expensive.
|
staticconstexpr |
Overload to allow use of cudaDeviceAttr
.
|
static |
Get the device at a given PCI bus ID, or throw if it doesn't exist.
cudaFuncCache sp::Device::getCacheConfig | ( | ) | const |
Query device cache configuration.
|
static |
Return the number of CUDA-capable devices present.
unsigned int sp::Device::getFlags | ( | ) | const |
Query the device flags.
int sp::Device::getId | ( | ) | const |
Get the device ID of this device.
Stream & sp::Device::getIncomingPrefetchStream | ( | ) |
Get the stream to be used for incoming memory prefetch operations on this device.
|
static |
Get an invalid device object.
size_t sp::Device::getLimit | ( | cudaLimit | limit | ) | const |
Query device limits.
std::string sp::Device::getPCIBusId | ( | ) | const |
Get the PCI bus ID string for this device.
const cudaDeviceProp & sp::Device::getProperties | ( | ) | const |
Get the device properties.
Typically, this performs better than cudaGetDeviceProperties()
because the result is globally cached. The first call to either this or getAttribute()
is expensive, but after that it's almost free.
Note also that xcmake provides a build system feature for making the output of getAttribute
a compile time constant for properties that are fixed hardware properties, allowing you to bake-in the values for a specific GPU, and use them in constexpr code.
cudaSharedMemConfig sp::Device::getSharedMemConfig | ( | ) | const |
Get the shared memory mode for this device.
std::pair< int, int > sp::Device::getStreamPriorityRange | ( | ) | const |
Get the least and greatest stream priorities supported by the device.
bool sp::Device::hasAddress | ( | __device const void * | ptr | ) | const |
Return true iff the given pointer points to memory on this device.
sp::Device::operator int | ( | ) | const |
Implicitly convert to the device ID.
void sp::Device::queueMemoryPrefetch | ( | const T * | src, |
size_t | count | ||
) | const |
Enqueue a unified memory prefetch operation on incomingPrefetchStream
void sp::Device::reset | ( | ) | const |
Reset this device.
void sp::Device::setCacheConfig | ( | cudaFuncCache | conf | ) |
Set device cache configuration.
void sp::Device::setFlags | ( | unsigned int | flags | ) |
Set the flags for this device.
This is expensive, and probably calls reset()
.
libcuda
is funny about setting flags, and doesn't let you do it if the device has been initialised. Device
initialisation happens behind the scenes inside libcuda at a basically random time, so the only way to reliably set flags is to try it, check if it happened, and if not aggressively reset the device and try again. This function takes care of that, but since the operation is potentially very expensive you should probably only do this on startup.
void sp::Device::setLimit | ( | cudaLimit | limit, |
size_t | value | ||
) |
Set device limits.
void sp::Device::setName | ( | const std::string & | name | ) | const |
Assign a name to the device.
This name appears in profilers, debuggers, and other tools, where supported.
void sp::Device::setSharedMemConfig | ( | cudaSharedMemConfig | conf | ) |
Set the shared memory mode for this device.
void sp::Device::synchronize | ( | ) | const |
Wait for all work on this device to finish.