Class for running regexes on the GPU. More...
#include <GPUMatcher.hpp>
Public Member Functions | |
template<int I> | |
constexpr int | groupsFor () const |
template<typename IntT > | |
auto | allocateMatchResultBuffer (IntT numStrings) const |
Allocate and return a NomadicTensor , the device view of which is suitable for use as the out parameter of match() when applied to numStrings -many strings. More... | |
auto | allocateSearchResultBuffer (int numStrings, int maxMatches=1024, int maxCaptureGroups=2048, int maxChars=100000000) const |
Allocate memory for storing search results. More... | |
template<typename StringsType , typename OutType > | |
__host__ void | isCompleteMatch (StringsType strings, OutType out, Stream &stream) const |
Determine which of the regexes match which of some input strings. More... | |
template<int PrefixFilterDepth = 6, typename DeviceStringsType > | |
__host__ auto | search (Stream &stream, Stream &stream2, impl::SearchResults< Regexes... > &out, const DeviceStringsType &gpuStringViews) const |
Find all the matches of the regexes at any offset into the given strings. More... | |
template<int PrefixFilterDepth = 6, typename DeviceStringsType > | |
__host__ auto | search2gpu (Stream &stream, Stream &stream2, impl::SearchResults< Regexes... > &out, const DeviceStringsType &gpuStringViews) const |
Like search(), but does not copy results to the GPU. More... | |
Static Public Attributes | |
constexpr static int | NumRegexes = sizeof...(Regexes) |
Class for running regexes on the GPU.
This class template can generate kernels that match multiple different regexes simultaneously on the GPU, avoiding the need to launch a different kernel for each regex.
Instantiations of this template may take a long time to compile, so you may wish to spread them across different translation units.
Include the specregex headers:
Input strings should be provided in an sp::StringViewBatch
. We can firstly initialise a sp::StringBatch
on the host from conventional host-side strings:
Given an sp::Stream
, we can gain a device-side view of this sp::StringBatch
, known as a sp::StringViewBatch
, which is compatible with specregex. This implicitly performs the host-to-device copy on that sp::Stream
.
While convenient, this API incurs a host-side copy when assembling the string batch. If you were to use sp::StringBuffer
and sp::StringView
for all your string processing, the strings would be maintained in the GPU-friendly format throughout, avoiding the need for this copy.
Now that we have a batch of example strings, the following code snippet will search two regexes on the GPU:
We can then display these on the host:
Putting it all together:
This example starts the same way as the search example. After copying the strings to the device, the following code snippet will match the input strings against four regexes:
We can then display these on the host:
Putting it all together:
Regexes | The regexes to be used by the kernels provided by this class. Each one should be an instantiation of sp::regex::Regex , such as sp::regex::Regex<STATIC_STRING("Ca+ts")> |
auto sp::regex::GPURegexMatcher< Regexes >::allocateMatchResultBuffer | ( | IntT | numStrings | ) | const |
Allocate and return a NomadicTensor
, the device view of which is suitable for use as the out
parameter of match()
when applied to numStrings
-many strings.
By calling sp::NomadicTensor::hostTensor()
, you can conveniently copy the buffer back to the host - or you can pass the device buffer to further GPU kernels, enqueued after the regex match operation.
auto sp::regex::GPURegexMatcher< Regexes >::allocateSearchResultBuffer | ( | int | numStrings, |
int | maxMatches = 1024 , |
||
int | maxCaptureGroups = 2048 , |
||
int | maxChars = 100000000 |
||
) | const |
Allocate memory for storing search results.
Pass the returned object to search()
. These objects can be reused efficiently for multiple search operations, but care must be taken to ensure you don't create a race condition when doing so (since the search API is asynchronous). You may find it helpful to use Stream::launchHostFunc()
to enqueue host-side work that uses the results on the same stream as the search operations.
Since GPU memory allocation is very expensive, and in some cases cannot be done in parallel with other use of the GPU, it is definitely worth reusing your result buffers.
numStrings | The maximum number of strings that can be involved in a search operation that uses the returned result buffer. |
maxMatches | The total number of matches (across all strings) that can be stored in the returned object. |
maxCaptureGroups | The total number of capture groups (across all strings/matches) that can be stored in the returned object. |
maxChars | An upper bound on the total length of all strings using this result buffer. |
Allocating a small object is cheaper. Currently, using a result object in a search operation that produces more results than it has capacity for will result in memory corruption. We'll probably make it stop doing that in the near future ;).
__host__ void sp::regex::GPURegexMatcher< Regexes >::isCompleteMatch | ( | StringsType | strings, |
OutType | out, | ||
Stream & | stream | ||
) | const |
Determine which of the regexes match which of some input strings.
This is a batched version of sp::regex::Regex::isCompleteMatch()
, applying multiple regexes to multiple strings, in parallel, on the GPU.
OutType
must satisfy TensorLike <__device bool, 2> strings | A source of device-resident sp::StringView s to process. |
out | A 2D, device-resident TensorLike to hold the outputs. The result for regex i applied to string j is written to out[i][j] . Commonly, this will be a device view into a NomadicTensor that you can later conveniently copy back to the host. Such a NomadicTensor can be generated by allocateMatchResultBuffer() . |
stream | The stream on which to queue the GPU kernel. |
__host__ auto sp::regex::GPURegexMatcher< Regexes >::search | ( | Stream & | stream, |
Stream & | stream2, | ||
impl::SearchResults< Regexes... > & | out, | ||
const DeviceStringsType & | gpuStringViews | ||
) | const |
Find all the matches of the regexes at any offset into the given strings.
stream | The stream to enqueue the GPU kernel on. |
stream2 | Another stream that some of the work will be enqueued on. The first stream will be made to await on this stream, so you can regard this function as synchronous with respect to stream . |
out | An output object allocated with allocateSearchResultBuffer() that will be populated with the result. resetForDevice() is called on the object to discard any existing data. |
gpuStringViews | A source of device-resident views into device-resident strings to process. |
__host__ auto sp::regex::GPURegexMatcher< Regexes >::search2gpu | ( | Stream & | stream, |
Stream & | stream2, | ||
impl::SearchResults< Regexes... > & | out, | ||
const DeviceStringsType & | gpuStringViews | ||
) | const |
Like search(), but does not copy results to the GPU.