Represents a GPU. More...

#include <Device.hpp>

Public Member Functions
void	ensureActive () const
	Ensure this device is the active one in libcuda. More...

	Device (const Device &)=delete

Device &	operator= (const Device &)=delete

int	getId () const
	Get the device ID of this device. More...

	operator int () const
	Implicitly convert to the device ID. More...

void	setName (const std::string &name) const
	Assign a name to the device. More...

const cudaDeviceProp &	getProperties () const
	Get the device properties. More...

cudaFuncCache	getCacheConfig () const
	Query device cache configuration. More...

void	setCacheConfig (cudaFuncCache conf)
	Set device cache configuration. More...

size_t	getLimit (cudaLimit limit) const
	Query device limits. More...

void	setLimit (cudaLimit limit, size_t value)
	Set device limits. More...

cudaSharedMemConfig	getSharedMemConfig () const
	Get the shared memory mode for this device. More...

void	setSharedMemConfig (cudaSharedMemConfig conf)
	Set the shared memory mode for this device. More...

unsigned int	getFlags () const
	Query the device flags. More...

void	setFlags (unsigned int flags)
	Set the flags for this device. More...

std::string	getPCIBusId () const
	Get the PCI bus ID string for this device. More...

void	reset () const
	Reset this device. More...

void	synchronize () const
	Wait for all work on this device to finish. More...

Stream	createStreamWithPriority (int priority, const std::string &name) const
	Make a prioritised stream on this device. More...

Stream	createStreamWithPriority (int priority) const
	Make a prioritised stream on this device. More...

Stream	createStream (const std::string &name) const
	Make a stream on this device. More...

Stream	createStream () const
	Make a stream on this device. More...

Stream &	getIncomingPrefetchStream ()
	Get the stream to be used for incoming memory prefetch operations on this device. More...

std::pair< int, int >	getStreamPriorityRange () const
	Get the least and greatest stream priorities supported by the device. More...

template<typename T >
sp::UniquePtr< __device T >	allocateMemory (size_t n, DeviceMemoryType memType=DeviceMemoryType::NORMAL)
	Allocate device memory. More...

template<typename T >
void	queueMemoryPrefetch (const T *src, size_t count) const
	Enqueue a unified memory prefetch operation on `incomingPrefetchStream` More...

template<typename T >
__device T *	getMappedDevicePtr (T *ptr) const

bool	hasAddress (__device const void *ptr) const
	Return true iff the given pointer points to memory on this device. More...

Static Public Member Functions
template<DeviceAttr Attr, int Device = 0>
constexpr static auto	getAttribute ()
	Query a property of the device. More...

template<cudaDeviceAttr Attr, int Device = 0>
constexpr static auto	getAttribute ()
	Overload to allow use of `cudaDeviceAttr`. More...

static const Device &	getInvalid ()
	Get an invalid device object. More...

static Device &	get (int i)
	Get the device object representing the i'th GPU, or throw if it doesn't exist. More...

static Device &	getByPCIBusId (std::string_view pciBusId)
	Get the device at a given PCI bus ID, or throw if it doesn't exist. More...

static Device &	choose (const cudaDeviceProp &prop)
	Select the device that most closely matches the given properties. More...

static Device &	getActive ()
	Get the "active" device according to libcuda's global state. More...

static int	getCount ()
	Return the number of CUDA-capable devices present. More...

Friends
class	Stream

Detailed Description

Represents a GPU.

Logically, a single global object exists to represent each GPU on the system. A reference to the n'th GPU (according to libcuda's numbering scheme) can be obtained via the static getter function sp::Device::get(int).

If you don't really care which GPU you're using, you can just construct a single global instance using sp::Device::getActive() and carry on as normal. If you're working with multiple GPUs, using objects to represent each one can be extremely helpful, allowing a single host thread to coordinate work across many GPUs. This allows you to avoid large amounts of host-side overhead caused by having a lot of host threads to coordinate your GPUs.

Member Function Documentation

◆ allocateMemory()

template<typename T >

sp::UniquePtr< __device T > sp::Device::allocateMemory	(	size_t	n,
		DeviceMemoryType	memType = `DeviceMemoryType::NORMAL`
	)

Allocate device memory.

Allocate memory for n elements of type T. Returned memory is a smart pointer in the __device address space that will cause the memory to be deallocated when it goes out of scope.

As usual for C++ unique pointers, you may want to promote the returned std::unique_ptr to a std::shared_ptr, or std::move it to some final location where it can represent the lifetime of the buffer.

◆ choose()

static Device & sp::Device::choose ( const cudaDeviceProp & prop )

static

Select the device that most closely matches the given properties.

See also: cudaChooseDevice()

◆ createStream() [1/2]

Stream sp::Device::createStream ( ) const

Make a stream on this device.

See also: cudaStreamCreateWithPriority

◆ createStream() [2/2]

Stream sp::Device::createStream ( const std::string & name ) const

Make a stream on this device.

Parameters

name	An optional name for the stream. This name may appear in profilers, debuggers, or other tools.

See also: cudaStreamCreateWithPriority

◆ createStreamWithPriority() [1/2]

Stream sp::Device::createStreamWithPriority ( int priority ) const

Make a prioritised stream on this device.

Parameters

priority The scheduling priority of the stream. Lower numbers represent higher priorities. The GPU will schedule work from higher-priority streams first.

See also: cudaStreamCreateWithPriority

◆ createStreamWithPriority() [2/2]

Stream sp::Device::createStreamWithPriority	(	int	priority,
		const std::string &	name
	)		const

Make a prioritised stream on this device.

Parameters

priority	The scheduling priority of the stream. Lower numbers represent higher priorities. The GPU will schedule work from higher-priority streams first.
name	An optional name for the stream. This name may appear in profilers, debuggers, or other tools.

See also: cudaStreamCreateWithPriority

◆ ensureActive()

void sp::Device::ensureActive ( ) const

Ensure this device is the active one in libcuda.

Use of this API is not recommended: you're better off using Device/Stream objects directly to do things on a particular device. This function is helpful when dealing with old code that is still using libcuda directly: you can ensure the right device is the "active" one when you call into the evil C code, so it does what you want.

This is mostly useful, therefore, as a migration aid.

◆ get()

static Device & sp::Device::get ( int i )

static

Get the device object representing the i'th GPU, or throw if it doesn't exist.

◆ getActive()

static Device & sp::Device::getActive ( )

static

Get the "active" device according to libcuda's global state.

Prefer to use Device objects directly instead of relying on CUDA's global state.

See also: cudaGetDevice()

◆ getAttribute() [1/2]

template<DeviceAttr Attr, int Device = 0>

constexpr static auto sp::Device::getAttribute ( )

staticconstexpr

Query a property of the device.

If the requested property is one that is fixed by the DeviceAssumptionCache, this function is constexpr and may be used in compile-time code.

For other properties (such as cudaDevAttrEccEnabled), this still outperforms cudaDeviceGetAttribute() because the result is cached (and software typically does not cope gracefully with such properties changing during execution).

The first call to a non-assumed property will be quite expensive.

See also: cudaDeviceGetAttribute; cudaGetDeviceProperties

◆ getAttribute() [2/2]

template<cudaDeviceAttr Attr, int Device = 0>

constexpr static auto sp::Device::getAttribute ( )

staticconstexpr

Overload to allow use of cudaDeviceAttr.

◆ getByPCIBusId()

static Device & sp::Device::getByPCIBusId ( std::string_view pciBusId )

static

Get the device at a given PCI bus ID, or throw if it doesn't exist.

See also: cudaDeviceGetByPCIBusId()

◆ getCacheConfig()

cudaFuncCache sp::Device::getCacheConfig ( ) const

Query device cache configuration.

See also: cudaDeviceGetCacheConfig

◆ getCount()

static int sp::Device::getCount ( )

static

Return the number of CUDA-capable devices present.

See also: cudaGetDeviceCount()

◆ getFlags()

unsigned int sp::Device::getFlags ( ) const

Query the device flags.

See also: cudaGetDeviceFlags

◆ getId()

int sp::Device::getId ( ) const

Get the device ID of this device.

◆ getIncomingPrefetchStream()

Stream & sp::Device::getIncomingPrefetchStream ( )

Get the stream to be used for incoming memory prefetch operations on this device.

◆ getInvalid()

static const Device & sp::Device::getInvalid ( )

static

Get an invalid device object.

◆ getLimit()

size_t sp::Device::getLimit ( cudaLimit limit ) const

Query device limits.

See also: cudaDeviceGetLimit

◆ getPCIBusId()

std::string sp::Device::getPCIBusId ( ) const

Get the PCI bus ID string for this device.

See also: cudaDeviceGetPCIBusId

◆ getProperties()

const cudaDeviceProp & sp::Device::getProperties ( ) const

Get the device properties.

Typically, this performs better than cudaGetDeviceProperties() because the result is globally cached. The first call to either this or getAttribute() is expensive, but after that it's almost free.

Note also that xcmake provides a build system feature for making the output of getAttribute a compile time constant for properties that are fixed hardware properties, allowing you to bake-in the values for a specific GPU, and use them in constexpr code.

See also: cudaDeviceGetAttribute; cudaGetDeviceProperties

◆ getSharedMemConfig()

cudaSharedMemConfig sp::Device::getSharedMemConfig ( ) const

Get the shared memory mode for this device.

See also: cudaDeviceGetSharedMemConfig

◆ getStreamPriorityRange()

std::pair< int, int > sp::Device::getStreamPriorityRange ( ) const

Get the least and greatest stream priorities supported by the device.

See also: cudaDeviceGetStreamPriorityRange()

◆ hasAddress()

bool sp::Device::hasAddress ( __device const void * ptr ) const

Return true iff the given pointer points to memory on this device.

◆ operator int()

sp::Device::operator int ( ) const

Implicitly convert to the device ID.

◆ queueMemoryPrefetch()

template<typename T >

void sp::Device::queueMemoryPrefetch	(	const T *	src,
		size_t	count
	)		const

Enqueue a unified memory prefetch operation on incomingPrefetchStream

◆ reset()

void sp::Device::reset ( ) const

Reset this device.

See also: cudaDeviceReset

◆ setCacheConfig()

void sp::Device::setCacheConfig ( cudaFuncCache conf )

Set device cache configuration.

See also: cudaDeviceSetCacheConfig

◆ setFlags()

void sp::Device::setFlags ( unsigned int flags )

Set the flags for this device.

This is expensive, and probably calls reset().

libcuda is funny about setting flags, and doesn't let you do it if the device has been initialised. Device initialisation happens behind the scenes inside libcuda at a basically random time, so the only way to reliably set flags is to try it, check if it happened, and if not aggressively reset the device and try again. This function takes care of that, but since the operation is potentially very expensive you should probably only do this on startup.

See also: cudaSetDeviceFlags

◆ setLimit()

void sp::Device::setLimit	(	cudaLimit	limit,
		size_t	value
	)

Set device limits.

See also: cudaDeviceSetLimit

◆ setName()

void sp::Device::setName ( const std::string & name ) const

Assign a name to the device.

This name appears in profilers, debuggers, and other tools, where supported.

◆ setSharedMemConfig()

void sp::Device::setSharedMemConfig ( cudaSharedMemConfig conf )

Set the shared memory mode for this device.

See also: cudaDeviceSetSharedMemConfig

◆ synchronize()

void sp::Device::synchronize ( ) const

Wait for all work on this device to finish.

See also: cudaDeviceSynchronize

Public Member Functions

Static Public Member Functions

Friends

Detailed Description

Member Function Documentation

◆ allocateMemory()

◆ choose()

◆ createStream() [1/2]

◆ createStream() [2/2]

◆ createStreamWithPriority() [1/2]

◆ createStreamWithPriority() [2/2]

◆ ensureActive()

◆ get()

◆ getActive()

◆ getAttribute() [1/2]

◆ getAttribute() [2/2]

◆ getByPCIBusId()

◆ getCacheConfig()

◆ getCount()

◆ getFlags()

◆ getId()

◆ getIncomingPrefetchStream()

◆ getInvalid()

◆ getLimit()

◆ getPCIBusId()

◆ getProperties()

◆ getSharedMemConfig()

◆ getStreamPriorityRange()

◆ hasAddress()

◆ operator int()

◆ queueMemoryPrefetch()

◆ reset()

◆ setCacheConfig()

◆ setFlags()

◆ setLimit()

◆ setName()

◆ setSharedMemConfig()

◆ synchronize()