Represents a TensorLike
that can be synchronised between the CPU and various GPUs.
More...
#include <NomadicTensor.hpp>
Public Types | |
using | ValueType = typename TensorType::ValueType |
The type of each element in the underlying TensorLike More... | |
using | DeviceTensor = typename TensorType::template InValueType< __device const ValueType > |
Type describing a device-resident immutable view of the underlying TensorLike More... | |
using | MutableDeviceTensor = typename TensorType::template InValueType< __device ValueType > |
Type describing a device-resident mutable view of the underlying TensorLike More... | |
using | HostTensor = typename TensorType::template InValueType< const ValueType > |
Type describing a host-resident immutable view of the underlying TensorLike More... | |
using | MutableHostTensor = typename TensorType::template InValueType< ValueType > |
Type describing a host-resident mutable view of the underlying TensorLike More... | |
Public Member Functions | |
NomadicTensor ()=default | |
Create an uninitialised NomadicTensor . More... | |
NomadicTensor (const TensorType, Args ... args) | |
Construct a NomadicTensor More... | |
void | setMemoryType (HostMemoryType hostFlags) |
Reinitialise the buffer with new flags. More... | |
void | reconfigure (Args... newArgs) |
Change the logical size of the underlying buffer. More... | |
void | reset () |
Forget the contents of the buffer so it may be reused for something else. More... | |
DeviceTensor | deviceTensor (sp::Stream &s, bool move=false) |
Get an immutable copy of the TensorLike on the device associated with stream s . More... | |
MutableDeviceTensor | mutableDeviceTensor (sp::Stream &s, bool move=false) |
Get a mutable copy of the TensorLike on the device associated with stream s . More... | |
HostTensor | hostTensor (sp::Stream &s, bool move=false) |
Get an immutable copy of the TensorLike on the device associated with stream s . More... | |
MutableHostTensor | mutableHostTensor (sp::Stream &s, bool move=false) |
Get a mutable copy of the TensorLike on the host. More... | |
MutableHostTensor | mutableHostTensor () |
A special version of mutableHostTensor() for doing host-side initialisation of the buffer. More... | |
Static Public Attributes | |
constexpr static int | Rank = TensorType::Rank |
Protected Attributes | |
std::shared_ptr< sp::NomadicBuffer< ValueType > > | data |
std::tuple< Args... > | args |
Represents a TensorLike
that can be synchronised between the CPU and various GPUs.
This is conceptually very similar to a NomadicBuffer
, except instead of a bare buffer you've got a specific TensorLike
.
using sp::NomadicTensor< TensorType, Args >::DeviceTensor = typename TensorType::template InValueType<__device const ValueType> |
Type describing a device-resident immutable view of the underlying TensorLike
using sp::NomadicTensor< TensorType, Args >::HostTensor = typename TensorType::template InValueType<const ValueType> |
Type describing a host-resident immutable view of the underlying TensorLike
using sp::NomadicTensor< TensorType, Args >::MutableDeviceTensor = typename TensorType::template InValueType<__device ValueType> |
Type describing a device-resident mutable view of the underlying TensorLike
using sp::NomadicTensor< TensorType, Args >::MutableHostTensor = typename TensorType::template InValueType<ValueType> |
Type describing a host-resident mutable view of the underlying TensorLike
using sp::NomadicTensor< TensorType, Args >::ValueType = typename TensorType::ValueType |
The type of each element in the underlying TensorLike
|
default |
Create an uninitialised NomadicTensor
.
sp::NomadicTensor< TensorType, Args >::NomadicTensor | ( | const | TensorType, |
Args ... | args | ||
) |
Construct a NomadicTensor
This operation does not allocate any buffers for the underlying TensorLike
. This merely initialises the buffer state machine and computes the size. Actual allocation occurs the first time you ask for a concrete view on some device (or the host).
The slightly odd signature of this constructor is an due to the nonexistnece of partial class template deduction in C++17. Typical usage will pass a default-constructed TensorLike
of the desired type as the first argument, followed by the constructor arguments that should be used when constructing actual instances.
DeviceTensor sp::NomadicTensor< TensorType, Args >::deviceTensor | ( | sp::Stream & | s, |
bool | move = false |
||
) |
Get an immutable copy of the TensorLike
on the device associated with stream s
.
If the target device does not already have an up-to-date copy of the buffer, a copy is enqueued on stream s
to create one. This function always returns immediately, yielding a TensorLike
backed by the buffer on the target device. It will only be safe to read from the returned object after the enqueued copy (if any) has completed.
Typically, you'll be passing the returned object to a kernel which you can simply queue up on the same stream, guaranteeing that the copy completes before any read.
s | The stream to use for any copy, and which identifies the device to synchronise to. |
move | If true, deallocate all copies of this buffer on devices except the target (including the host) |
HostTensor sp::NomadicTensor< TensorType, Args >::hostTensor | ( | sp::Stream & | s, |
bool | move = false |
||
) |
Get an immutable copy of the TensorLike
on the device associated with stream s
.
If the host does not already have an up-to-date copy of the buffer, a copy is enqueued on stream s
to create one. This function always returns immediately, yielding a TensorLike
backed by the host buffer. It will only be safe to read from the returned object after the enqueued copy (if any) has completed.
sp::Device::launchHostFunc()
provides a convenient and efficient way to enqueue host-side work using the buffer that waits until after the copy. You can queue up further device-side work after the host function to avoid stalling the stream after the host function completes while you launch something else.
s | The stream to use for any copy. |
move | If true, deallocate all other copies of the object. |
MutableDeviceTensor sp::NomadicTensor< TensorType, Args >::mutableDeviceTensor | ( | sp::Stream & | s, |
bool | move = false |
||
) |
Get a mutable copy of the TensorLike
on the device associated with stream s
.
As a result of this call, the target device will be considered to have modified the object. Any future requests for access to the object on devices other than this one (or the host) will cause a copy from this device.
If the target device does not already have an up-to-date copy of the buffer, a copy is enqueued on stream s
to create one. This function always returns immediately, yielding a TensorLike
backed by the buffer on the target device. It will only be safe to read from the returned object after the enqueued copy (if any) has completed.
Typically, you'll be passing the returned object to a kernel which you can simply queue up on the same stream, guaranteeing that the copy completes before any read.
s | The stream to use for any copy, and which identifies the device to synchronise to. |
move | If true, deallocate all copies of this buffer on devices except the target (including the host) |
MutableHostTensor sp::NomadicTensor< TensorType, Args >::mutableHostTensor | ( | ) |
A special version of mutableHostTensor()
for doing host-side initialisation of the buffer.
This allows you to access the host side of the buffer without using an sp::Stream
. This is primarily useful for populating a NomadicTensor from the host before doing anything with the GPU.
NomadicBuffer::mutableHostTensor()
MutableHostTensor sp::NomadicTensor< TensorType, Args >::mutableHostTensor | ( | sp::Stream & | s, |
bool | move = false |
||
) |
Get a mutable copy of the TensorLike
on the host.
As a result of this call, the host device will be considered to have modified the object. Any future requests for access to the object outside the host will cause a host->device copy.
If the host does not already have an up-to-date copy of the buffer, a copy is enqueued on stream s
to create one. This function always returns immediately, yielding a TensorLike
backed by the host buffer. It will only be safe to read from the returned object after the enqueued copy (if any) has completed.
sp::Device::launchHostFunc()
provides a convenient and efficient way to enqueue host-side work using the buffer that waits until after the copy. You can queue up further device-side work after the host function to avoid stalling the stream after the host function completes while you launch something else.
s | The stream to use for any copy. |
move | If true, deallocate all other copies of the object. |
void sp::NomadicTensor< TensorType, Args >::reconfigure | ( | Args... | newArgs | ) |
Change the logical size of the underlying buffer.
The allocations are not changed, but the TensorLike
returned by future view calls will reflect construction with the given new set of arguments. Making the result not go out of bounds is left as an exercise for the reader.
void sp::NomadicTensor< TensorType, Args >::reset | ( | ) |
Forget the contents of the buffer so it may be reused for something else.
NomadicBuffer::reset()
. void sp::NomadicTensor< TensorType, Args >::setMemoryType | ( | HostMemoryType | hostFlags | ) |
Reinitialise the buffer with new flags.
This will destroy the contents of the buffer.
It's anticipated that this only be using during initiation, where speed is less of a concern.