Access to the more elaborate syncthreads
variants.
More...
Functions | |
__device__ void | sp::syncthreads (int barrierID, int numWarps) |
A more powerful __syncthreads() More... | |
__device__ void | sp::syncthreads (int barrierID) |
Like the other sp::syncthreads() , but implicitly synchronises all non-exited warps in the block. More... | |
__device__ void | sp::syncthreads_arrive (int barrierID, int numWarps) |
Functions exactly like sp::syncthreads() , but this warp does not block. More... | |
__device__ int | sp::syncthreads_count (int barrierID, int numWarps, bool predicate) |
Like sp::syncthreads() , but also returns a count of how many threads passed true for predicate . More... | |
__device__ int | sp::syncthreads_count (int barrierID, bool predicate) |
sp::syncthreads_count() , implicitly applied to all non-exited warps. More... | |
__device__ bool | sp::syncthreads_and (int barrierID, int numWarps, bool predicate) |
Like sp::syncthreads() , but also returns true iff all participating threads passed true for predicate . More... | |
__device__ bool | sp::syncthreads_and (int barrierID, bool predicate) |
sp::syncthreads_and() , implicitly applied to all non-exited warps. More... | |
__device__ bool | sp::syncthreads_or (int barrierID, int numWarps, bool predicate) |
Like sp::syncthreads() , but also returns true iff any participating threads passed true for predicate . More... | |
__device__ bool | sp::syncthreads_or (int barrierID, bool predicate) |
sp::syncthreads_or() , implicitly applied to all non-exited warps. More... | |
Access to the more elaborate syncthreads
variants.
These are preferable to using inline PTX because:
__device__ void sp::syncthreads | ( | int | barrierID | ) |
Like the other sp::syncthreads()
, but implicitly synchronises all non-exited warps in the block.
This actually maps to a different underlying instruction, so is slightly cheaper (and much less annoying) than keeping track of the proper value to pass to numWarps
yourself.
__device__ void sp::syncthreads | ( | int | barrierID, |
int | numWarps | ||
) |
A more powerful __syncthreads()
This basically exposes the ptx bar.sync
instruction directly.
There are 16 counters, initially zero. When a warp reaches syncthreads(X, Y)
, the X'th counter is increased by warpSize (regardless of how many threads of that warp were actually active at the time), and the warp goes to sleep. When the counter reaches Y, it is reset to zero and all threads sleeping on counter X wake up.
The behaviour is undefined if:
Vanilla __syncthreads()
is equivalent to sp::syncthreads(0, num_warps_in_current_block)
, except it doesn't incur the cost of having to compute that second argument.
This is useful for performance in cases where you discover some threads want to early-return, but ordinarily can't because doing so would make a future __syncthreads()
deadlock. What you can do instead is compute how many warps are going to survive and pass that as the numWarps
argument here.
This mechanism also allows you to synchronise less than an entire block at once. You can use this to make different warps in the same block carry out independent work, synchronising with each other as they go along. This effectively allows you to use one block as if it were multiple blocks (but with the benefits of sharing caches, shared memory, and so on).
barrierID | Which barrier counter to use. |
numWarps | Number of warps to synchronise. |
__device__ bool sp::syncthreads_and | ( | int | barrierID, |
bool | predicate | ||
) |
sp::syncthreads_and()
, implicitly applied to all non-exited warps.
__device__ bool sp::syncthreads_and | ( | int | barrierID, |
int | numWarps, | ||
bool | predicate | ||
) |
Like sp::syncthreads()
, but also returns true iff all participating threads passed true for predicate
.
__device__ void sp::syncthreads_arrive | ( | int | barrierID, |
int | numWarps | ||
) |
Functions exactly like sp::syncthreads()
, but this warp does not block.
This allows a thread to notify a barrier without waiting on it. This may wake up other threads that were waiting. This is useful for implementing producer-consumer patterns, among other things.
__device__ int sp::syncthreads_count | ( | int | barrierID, |
bool | predicate | ||
) |
sp::syncthreads_count()
, implicitly applied to all non-exited warps.
__device__ int sp::syncthreads_count | ( | int | barrierID, |
int | numWarps, | ||
bool | predicate | ||
) |
Like sp::syncthreads()
, but also returns a count of how many threads passed true for predicate
.
__device__ bool sp::syncthreads_or | ( | int | barrierID, |
bool | predicate | ||
) |
sp::syncthreads_or()
, implicitly applied to all non-exited warps.
__device__ bool sp::syncthreads_or | ( | int | barrierID, |
int | numWarps, | ||
bool | predicate | ||
) |
Like sp::syncthreads()
, but also returns true iff any participating threads passed true for predicate
.