Represents an event in a compute stream. More...

#include <Event.hpp>

Inheritance diagram for sp::Event:

Public Member Functions
bool	isDone () const
	Returns true iff the work captured by this event has been completed. More...

Public Member Functions inherited from sp::RAIIObject< EventAllocator, bool >
	RAIIObject (const CTorArgs &... args)
	Allocate a new object and take ownership of it. More...

	RAIIObject (const APIType &obj, bool own=true)
	Wrap an existing object. More...

const APIType	get () const
	Get the underlying C API object (eg. `cudaStream_t`) More...

APIType	get ()
	Get the underlying C API object (eg. `cudaStream_t`) More...

APIType	operator* ()

const APIType	operator* () const

	operator APIType () const
	Implicitly convert to the C API type, so you can just pass this object to the C library whence it came. More...

	operator APIType ()

Friends
class	Stream

Additional Inherited Members
Protected Types inherited from sp::RAIIObject< EventAllocator, bool >
using	APIType = typename AllocType::APIType
	The C API type. Something like `cudaStream_t`. More...

using	UnderlyingType = std::remove_pointer_t< APIType >

Detailed Description

Represents an event in a compute stream.

Events allow streams (or the host, if using BlockingEvent) to wait until other streams have reached a certain point in their work queue.

If you want to synchronise the host with an event, use the more expensive BlockingEvent. This limitation also exists in the CUDA API: trying to synchronize a cudaEvent that wasn't constructed with the right flag would return immediately with no error.

You can obtain an Event representing the outstanding work on a stream by calling sp::Stream::recordEvent().

Unsupported APIs

`cudaEventElapsedTime()`

The NVIDIA® documentation of this function explains fairly well why this isn't useful to have around:

If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.

So: this isn't really a timing function. It's a low-entropy PRNG. The NVIDIA® CUDA® implementation enables timing by default, harming performance.

If you want to time things, you can do so more accurately by using:

The host's clock (to measure host-to-host latency of something you care about)
The GPU's clock (using the clock64() device function and passing the value back to the host)
The profiler (or profiler API), though this is platform-specific.

All of these options will yield better timing results than NVIDIA's stream-timing APIs, and all (except the profiler) have much lower overhead.

Member Function Documentation

◆ isDone()

bool sp::Event::isDone ( ) const

Returns true iff the work captured by this event has been completed.

See also: cudaEventQuery()

Public Member Functions

Friends

Additional Inherited Members

Detailed Description

Unsupported APIs

cudaEventElapsedTime()

Member Function Documentation

◆ isDone()

`cudaEventElapsedTime()`