Speclib  0.1.2
The library for writing better CUDA libraries
sp::NomadicBuffer< T, SizeQuantisation > Class Template Reference

A buffer that can be easily moved between devices, and which keeps track of where it was last modified. More...

#include <NomadicBuffer.hpp>

Public Member Functions

 NomadicBuffer (const NomadicBuffer< T > &)=delete
 
NomadicBuffer< T > & operator= (const NomadicBuffer< T > &)=delete
 
 NomadicBuffer (int allocationSize, HostMemoryType hostFlags=HostMemoryType::PINNED)
 Create a new NomadicBuffer of the given size in Ts. More...
 
 NomadicBuffer (sp::UniquePtr< T > &&buffer, int allocationSize)
 Construct a NomadicBuffer using an already-allocated host buffer. More...
 
int getSize () const
 
void prune (const sp::Device &keepDevice=sp::Device::getInvalid(), bool keepHost=false)
 Deallocate copies of the buffer stored on various devices (or the host). More...
 
void reset ()
 "Forget" the contents of the buffer so it can be useused for something else. More...
 
void resize (int newSize)
 Change the logical size of the buffer. More...
 
const __device T * devicePtr (sp::Stream &s, bool move=false)
 Synchronise the buffer to the device associated with the given stream and return a pointer to the buffer on that device. More...
 
__device T * mutableDevicePtr (sp::Stream &s, bool move=false)
 Synchronise the buffer to the device associated with the given stream and return a pointer to the buffer on that device. More...
 
const T * hostPtr (sp::Stream &s, bool move=false)
 Synchronise the buffer to the host, if necessary, and return a pointer to the host buffer. More...
 
T * mutableHostPtr (sp::Stream &s, bool move=false)
 Synchronise the buffer to the host, if necessary, and return a pointer to the host buffer. More...
 
T * mutableHostPtr ()
 Access the host buffer for the purposes of initialisation. More...
 

Detailed Description

template<typename T, int SizeQuantisation = 1>
class sp::NomadicBuffer< T, SizeQuantisation >

A buffer that can be easily moved between devices, and which keeps track of where it was last modified.

This can be used to asynchronously queue up elaborate sequences of copies between devices, cuda kernels, and host functions. This kind of full asynchronousness means you often don't need to bother having one host thread per stream, which can help reduce overheads in programs with many streams.

Example
// Initialise a (lazily allocated) host/device synchronisable buffer of `SIZE` `uint32_t`s.
// Put some stuff in it (on the host), synchronously.
fill(buffer.mutableHostPtr(s), SIZE);
// Launch a kernel to do something to it (async)
int numBlocks = sp::divRoundUp(SIZE, BLOCK_SIZE);
some_kernel<<<numBlocks, BLOCK_SIZE, 0, s>>>(buffer.mutableDevicePtr(s), SIZE);
// Get the result back, and pass it to some host function `verify()` that uses the result (async)
// hostPtr() will enqueue a copy onto the stream (since we asked for a mutable device view above),
// but will immediately return the pointer to where that copy is going to land. We can then use
// that pointer immediately to queue up more work after the copy that uses the result. No need to
// wait for anything!
auto hostPtr = buffer.hostPtr(s);
s.launchHostFunc([&]() {
verify(hostPtr, SIZE);
});
// Control can get here before the kernel has even started...
A buffer that can be easily moved between devices, and which keeps track of where it was last modifie...
Definition: NomadicBuffer.hpp:50
const T * hostPtr(sp::Stream &s, bool move=false)
Synchronise the buffer to the host, if necessary, and return a pointer to the host buffer.
Definition: NomadicBuffer.hpp:281
T fill(T... args)
constexpr auto divRoundUp(T x, Q y)
Divide x by y, rounding up.
Definition: IntMath.hpp:21
Template Parameters
TThe element type of the buffer.
SizeQuantisationThe amount to round allocations up to, in Ts. The reported size is unaffected.

Constructor & Destructor Documentation

◆ NomadicBuffer() [1/2]

template<typename T , int SizeQuantisation = 1>
sp::NomadicBuffer< T, SizeQuantisation >::NomadicBuffer ( int  allocationSize,
HostMemoryType  hostFlags = HostMemoryType::PINNED 
)
explicit

Create a new NomadicBuffer of the given size in Ts.

By default, this constructor does no allocation. Memory is allocated the first time you ask for the buffer to be synchronised to a location where no allocation has yet been done.

Since this behaviour can lead to latency spikes the first few times your program does its thing, this may be undesirable. If you would prefer to eagerly allocate memory for this buffer on all devices, set eagerlyAllocate to true.

◆ NomadicBuffer() [2/2]

template<typename T , int SizeQuantisation = 1>
sp::NomadicBuffer< T, SizeQuantisation >::NomadicBuffer ( sp::UniquePtr< T > &&  buffer,
int  allocationSize 
)
explicit

Construct a NomadicBuffer using an already-allocated host buffer.

The newly-constructed NomadicBuffer takes ownership of the given host buffer. No copy is performed. Performance will be poor if the given buffer is not pinned memory.

Parameters
bufferThe already-existing host buffer.
allocationSizeSize of the buffer.

Member Function Documentation

◆ devicePtr()

template<typename T , int SizeQuantisation = 1>
const __device T * sp::NomadicBuffer< T, SizeQuantisation >::devicePtr ( sp::Stream s,
bool  move = false 
)

Synchronise the buffer to the device associated with the given stream and return a pointer to the buffer on that device.

If a copy needs to be done, it will be enqueued on stream s.

Parameters
sThe stream to use for any copy, and which identifies the device to synchronise to.
moveIf true, deallocate all copies of this buffer on devices except the target (including the host)

◆ hostPtr()

template<typename T , int SizeQuantisation = 1>
const T * sp::NomadicBuffer< T, SizeQuantisation >::hostPtr ( sp::Stream s,
bool  move = false 
)

Synchronise the buffer to the host, if necessary, and return a pointer to the host buffer.

Parameters
sStream to use for any copy operation
moveIf true, deallocate all copies of this buffer on devices except the target (including the host)

◆ mutableDevicePtr()

template<typename T , int SizeQuantisation = 1>
__device T * sp::NomadicBuffer< T, SizeQuantisation >::mutableDevicePtr ( sp::Stream s,
bool  move = false 
)

Synchronise the buffer to the device associated with the given stream and return a pointer to the buffer on that device.

If a copy needs to be done, it will be enqueued on stream s.

Parameters
sThe stream to use for any copy, and which identifies the device to synchronise to.
moveIf true, deallocate all copies of this buffer on devices except the target (including the host)

◆ mutableHostPtr() [1/2]

template<typename T , int SizeQuantisation = 1>
T * sp::NomadicBuffer< T, SizeQuantisation >::mutableHostPtr ( )

Access the host buffer for the purposes of initialisation.

This function is equivalent to the other mutableHostPtr() function, except it's only legal when the buffer is already synchronised to the host, or uninitialised. In such cases, you can use this to access the host buffer without providing a stream.

Calling this function when the buffer is not available on the host is undefined behaviour (and will assert in debug builds).

Usually, this is used for initialising the contents of a buffer shortly after constructing it.

Returns
A mutable pointer to the host buffer.

◆ mutableHostPtr() [2/2]

template<typename T , int SizeQuantisation = 1>
T * sp::NomadicBuffer< T, SizeQuantisation >::mutableHostPtr ( sp::Stream s,
bool  move = false 
)

Synchronise the buffer to the host, if necessary, and return a pointer to the host buffer.

Parameters
sStream to use for any copy operation
moveIf true, deallocate all copies of this buffer on devices except the target (including the host)

◆ prune()

template<typename T , int SizeQuantisation = 1>
void sp::NomadicBuffer< T, SizeQuantisation >::prune ( const sp::Device keepDevice = sp::Device::getInvalid(),
bool  keepHost = false 
)

Deallocate copies of the buffer stored on various devices (or the host).

By default, all buffers are deallocated.

No buffer synchronisation is performed. If the buffer has been dirtied by a device other that one you're choosing to keep, those changes will be destroyed.

Usually you'd want to use the move arguments to the various *Ptr() functions documented below instead of calling this function directly.

Parameters
keepDeviceDon't deallocate the buffer stored on the identified device (if any).
keepHostIf true, don't deallocate the host buffer.

◆ reset()

template<typename T , int SizeQuantisation = 1>
void sp::NomadicBuffer< T, SizeQuantisation >::reset ( )

"Forget" the contents of the buffer so it can be useused for something else.

No buffers are deallocated.

This operation is extremely cheap: it simply resets the metadata to regard all allocated buffers as up-to-date. Effectively this discards all the data in the buffers, since the state machine used to ensure consistency has been reset.

The intention is that this function may be used when you want to clear and reuse a buffer without the expense of reallocating it (or doing a pointless copy of now-junk data just to keep the state machine happy).

◆ resize()

template<typename T , int SizeQuantisation = 1>
void sp::NomadicBuffer< T, SizeQuantisation >::resize ( int  newSize)

Change the logical size of the buffer.

This has no effect on the synchronisation state machine, but simply means all future synchronisations will only consider the first newSize-many elements of the buffer to be live.