Represents an event in a compute stream. More...
#include <Event.hpp>
Public Member Functions | |
bool | isDone () const |
Returns true iff the work captured by this event has been completed. More... | |
Public Member Functions inherited from sp::RAIIObject< EventAllocator, bool > | |
RAIIObject (const CTorArgs &... args) | |
Allocate a new object and take ownership of it. More... | |
RAIIObject (const APIType &obj, bool own=true) | |
Wrap an existing object. More... | |
const APIType | get () const |
Get the underlying C API object (eg. cudaStream_t ) More... | |
APIType | get () |
Get the underlying C API object (eg. cudaStream_t ) More... | |
APIType | operator* () |
const APIType | operator* () const |
operator APIType () const | |
Implicitly convert to the C API type, so you can just pass this object to the C library whence it came. More... | |
operator APIType () | |
Friends | |
class | Stream |
Additional Inherited Members | |
Protected Types inherited from sp::RAIIObject< EventAllocator, bool > | |
using | APIType = typename AllocType::APIType |
The C API type. Something like cudaStream_t . More... | |
using | UnderlyingType = std::remove_pointer_t< APIType > |
Represents an event in a compute stream.
Events allow streams (or the host, if using BlockingEvent
) to wait until other streams have reached a certain point in their work queue.
If you want to synchronise the host with an event, use the more expensive BlockingEvent
. This limitation also exists in the CUDA API: trying to synchronize a cudaEvent
that wasn't constructed with the right flag would return immediately with no error.
You can obtain an Event
representing the outstanding work on a stream by calling sp::Stream::recordEvent()
.
cudaEventElapsedTime()
The NVIDIA® documentation of this function explains fairly well why this isn't useful to have around:
If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.
So: this isn't really a timing function. It's a low-entropy PRNG. The NVIDIA® CUDA® implementation enables timing by default, harming performance.
If you want to time things, you can do so more accurately by using:
All of these options will yield better timing results than NVIDIA's stream-timing APIs, and all (except the profiler) have much lower overhead.
bool sp::Event::isDone | ( | ) | const |
Returns true iff the work captured by this event has been completed.