libSCALE  0.2.0
A modern C++ CUDA API
sp::Event Class Reference

Represents an event in a compute stream. More...

#include <Event.hpp>

Inheritance diagram for sp::Event:
[legend]

Public Member Functions

bool isDone () const
 Returns true iff the work captured by this event has been completed. More...
 
- Public Member Functions inherited from sp::RAIIObject< EventAllocator, bool >
 RAIIObject (const CTorArgs &... args)
 Allocate a new object and take ownership of it. More...
 
 RAIIObject (const APIType &obj, bool own=true)
 Wrap an existing object. More...
 
const APIType get () const
 Get the underlying C API object (eg. cudaStream_t) More...
 
APIType get ()
 Get the underlying C API object (eg. cudaStream_t) More...
 
APIType operator* ()
 
const APIType operator* () const
 
 operator APIType () const
 Implicitly convert to the C API type, so you can just pass this object to the C library whence it came. More...
 
 operator APIType ()
 

Friends

class Stream
 

Additional Inherited Members

- Protected Types inherited from sp::RAIIObject< EventAllocator, bool >
using APIType = typename AllocType::APIType
 The C API type. Something like cudaStream_t. More...
 
using UnderlyingType = std::remove_pointer_t< APIType >
 

Detailed Description

Represents an event in a compute stream.

Events allow streams (or the host, if using BlockingEvent) to wait until other streams have reached a certain point in their work queue.

If you want to synchronise the host with an event, use the more expensive BlockingEvent. This limitation also exists in the CUDA API: trying to synchronize a cudaEvent that wasn't constructed with the right flag would return immediately with no error.

You can obtain an Event representing the outstanding work on a stream by calling sp::Stream::recordEvent().

Unsupported APIs

cudaEventElapsedTime()

The NVIDIA® documentation of this function explains fairly well why this isn't useful to have around:

If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.

So: this isn't really a timing function. It's a low-entropy PRNG. The NVIDIA® CUDA® implementation enables timing by default, harming performance.

If you want to time things, you can do so more accurately by using:

  • The host's clock (to measure host-to-host latency of something you care about)
  • The GPU's clock (using the clock64() device function and passing the value back to the host)
  • The profiler (or profiler API), though this is platform-specific.

All of these options will yield better timing results than NVIDIA's stream-timing APIs, and all (except the profiler) have much lower overhead.

Member Function Documentation

◆ isDone()

bool sp::Event::isDone ( ) const

Returns true iff the work captured by this event has been completed.

See also
cudaEventQuery()