SpecRegex  0.5.0
A GPU-accelerated regex library
sp::regex::GPURegexMatcher< Regexes > Class Template Reference

Class for running regexes on the GPU. More...

#include <GPUMatcher.hpp>

Public Member Functions

template<int I>
constexpr int groupsFor () const
 
template<typename IntT >
auto allocateMatchResultBuffer (IntT numStrings) const
 Allocate and return a NomadicTensor, the device view of which is suitable for use as the out parameter of match() when applied to numStrings-many strings. More...
 
auto allocateSearchResultBuffer (int numStrings, int maxMatches=1024, int maxCaptureGroups=2048, int maxChars=100000000) const
 Allocate memory for storing search results. More...
 
template<typename StringsType , typename OutType >
__host__ void isCompleteMatch (StringsType strings, OutType out, Stream &stream) const
 Determine which of the regexes match which of some input strings. More...
 
template<int PrefixFilterDepth = 6, typename DeviceStringsType >
__host__ auto search (Stream &stream, Stream &stream2, impl::SearchResults< Regexes... > &out, const DeviceStringsType &gpuStringViews) const
 Find all the matches of the regexes at any offset into the given strings. More...
 
template<int PrefixFilterDepth = 6, typename DeviceStringsType >
__host__ auto search2gpu (Stream &stream, Stream &stream2, impl::SearchResults< Regexes... > &out, const DeviceStringsType &gpuStringViews) const
 Like search(), but does not copy results to the GPU. More...
 

Static Public Attributes

constexpr static int NumRegexes = sizeof...(Regexes)
 

Detailed Description

template<typename... Regexes>
class sp::regex::GPURegexMatcher< Regexes >

Class for running regexes on the GPU.

This class template can generate kernels that match multiple different regexes simultaneously on the GPU, avoiding the need to launch a different kernel for each regex.

Instantiations of this template may take a long time to compile, so you may wish to spread them across different translation units.

ExamplesC bu

Search example

Include the specregex headers:

#include <spec/regex/Regex.hpp>
#include <spec/regex/GPUMatcher.hpp>
#include <spec/string/StringBuffer.hpp>
#include <spec/string/StringBatch.hpp>

Input strings should be provided in an sp::StringViewBatch. We can firstly initialise a sp::StringBatch on the host from conventional host-side strings:

// Create a vector of arbitrary strings:
"Aubergines",
"So are aubergines",
"May I have a picture of an octopus?",
"++ OUT OF CHEESE ERROR ++",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"Something something this is totally a news article",
"Trumpets are cool."
};
// Concatenate the strings into a stringBatch:
sp::StringBatch<char> cpuStrings{strings};

Given an sp::Stream, we can gain a device-side view of this sp::StringBatch, known as a sp::StringViewBatch, which is compatible with specregex. This implicitly performs the host-to-device copy on that sp::Stream.

// Initialise the GPU and two streams:
auto& gpu = sp::Device::getActive();
auto s = gpu.createStream();
auto s2 = gpu.createStream();
// Copy the strings to the GPU:
auto gpuStrings = cpuStrings.deviceView(s);
static Device & getActive()

While convenient, this API incurs a host-side copy when assembling the string batch. If you were to use sp::StringBuffer and sp::StringView for all your string processing, the strings would be maintained in the GPU-friendly format throughout, avoiding the need for this copy.

Now that we have a batch of example strings, the following code snippet will search two regexes on the GPU:

// We now define two regexes:
using RegexC = sp::regex::Regex<REGEX_PATTERN("[Aa]ubergines")>;
using RegexD = sp::regex::Regex<REGEX_PATTERN("[a-zA-Z_][a-zA-Z0-9_]*")>;
// Create a matcher:
auto outputBuffer = matcher.allocateSearchResultBuffer(gpuStrings.size());
// Actually run the search:
auto result = matcher.search(s, s2, outputBuffer, gpuStrings);
Class for running regexes on the GPU.
Definition: GPUMatcher.hpp:88
auto allocateSearchResultBuffer(int numStrings, int maxMatches=1024, int maxCaptureGroups=2048, int maxChars=100000000) const
Allocate memory for storing search results.
Definition: GPUMatcher.hpp:140
A regex.
Definition: Regex.hpp:56

We can then display these on the host:

// Print the results:
s.launchHostFunc([&](){
for (int i = 0; i < (int) strings.size(); i++) {
// Iterate results for regex 0 on string i.
out << "Results for regex 0:\n";
for (auto match : result.resultsFor<0>(i)) {
out << match << "\n";
}
out << "Results for regex 1:\n";
for (auto match : result.resultsFor<1>(i)) {
out << match << "\n";
}
}
});
s.synchronize();
T size(T... args)

Putting it all together:

#include <spec/regex/Regex.hpp>
#include <spec/regex/GPUMatcher.hpp>
#include <spec/string/StringBuffer.hpp>
#include <spec/string/StringBatch.hpp>
void testGPUSearch(std::ostream &out)
{
// Create a vector of arbitrary strings:
"Aubergines",
"So are aubergines",
"May I have a picture of an octopus?",
"++ OUT OF CHEESE ERROR ++",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"Something something this is totally a news article",
"Trumpets are cool."
};
// Concatenate the strings into a stringBatch:
sp::StringBatch<char> cpuStrings{strings};
// Initialise the GPU and two streams:
auto& gpu = sp::Device::getActive();
auto s = gpu.createStream();
auto s2 = gpu.createStream();
// Copy the strings to the GPU:
auto gpuStrings = cpuStrings.deviceView(s);
// We now define two regexes:
using RegexC = sp::regex::Regex<REGEX_PATTERN("[Aa]ubergines")>;
using RegexD = sp::regex::Regex<REGEX_PATTERN("[a-zA-Z_][a-zA-Z0-9_]*")>;
// Create a matcher:
auto outputBuffer = matcher.allocateSearchResultBuffer(gpuStrings.size());
// Actually run the search:
auto result = matcher.search(s, s2, outputBuffer, gpuStrings);
// Print the results:
s.launchHostFunc([&](){
for (int i = 0; i < (int) strings.size(); i++) {
// Iterate results for regex 0 on string i.
out << "Results for regex 0:\n";
for (auto match : result.resultsFor<0>(i)) {
out << match << "\n";
}
out << "Results for regex 1:\n";
for (auto match : result.resultsFor<1>(i)) {
out << match << "\n";
}
}
});
s.synchronize();
}

Match example

This example starts the same way as the search example. After copying the strings to the device, the following code snippet will match the input strings against four regexes:

using RegexC = sp::regex::Regex<REGEX_PATTERN("Aubergines")>;
using RegexD = sp::regex::Regex<REGEX_PATTERN("[a-zA-Z_][a-zA-Z0-9_]*]")>;
using RegexE = sp::regex::Regex<REGEX_PATTERN(".*Trump|[Mm]ay|Potato.*")>;
using RegexF = sp::regex::Regex<REGEX_PATTERN("(?:a|(b)|c*)(?:a+|a{2})")>;
auto outputBuffer = matcher.allocateMatchResultBuffer(gpuStrings.size());
matcher.isCompleteMatch(gpuStrings, outputBuffer.mutableDeviceTensor(s), s);
auto result = outputBuffer.hostTensor(s);
auto allocateMatchResultBuffer(IntT numStrings) const
Allocate and return a NomadicTensor, the device view of which is suitable for use as the out paramete...
Definition: GPUMatcher.hpp:109

We can then display these on the host:

// Test that the results are as expected:
s.launchHostFunc([&]() {
for (int i = 0; i < 4; i++) {
for (int j = 0; j < (int) strings.size(); j++) {
bool v = result.read(sp::Vec<int, 2>{i, j});
out << "Result for regex " << i << " and string " << j << ": "
<< (v ? "matches" : "does not match") << "\n";
}
}
});
s.synchronize();

Putting it all together:

#include <spec/regex/Regex.hpp>
#include <spec/regex/GPUMatcher.hpp>
#include <spec/string/StringBuffer.hpp>
#include <spec/string/StringBatch.hpp>
void testGPUMatch(std::ostream &out)
{
// Create a vector of arbitrary strings:
"Aubergines",
"So are aubergines",
"May I have a picture of an octopus?",
"++ OUT OF CHEESE ERROR ++",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"Something something this is totally a news article",
"Trumpets are cool."
};
// Concatenate the strings into a stringBatch:
sp::StringBatch<char> cpuStrings{strings};
// Initialise the GPU and one stream:
auto& gpu = sp::Device::getActive();
auto s = gpu.createStream();
// Copy the strings to the GPU:
auto gpuStrings = cpuStrings.deviceView(s);
using RegexC = sp::regex::Regex<REGEX_PATTERN("Aubergines")>;
using RegexD = sp::regex::Regex<REGEX_PATTERN("[a-zA-Z_][a-zA-Z0-9_]*]")>;
using RegexE = sp::regex::Regex<REGEX_PATTERN(".*Trump|[Mm]ay|Potato.*")>;
using RegexF = sp::regex::Regex<REGEX_PATTERN("(?:a|(b)|c*)(?:a+|a{2})")>;
auto outputBuffer = matcher.allocateMatchResultBuffer(gpuStrings.size());
matcher.isCompleteMatch(gpuStrings, outputBuffer.mutableDeviceTensor(s), s);
auto result = outputBuffer.hostTensor(s);
// Test that the results are as expected:
s.launchHostFunc([&]() {
for (int i = 0; i < 4; i++) {
for (int j = 0; j < (int) strings.size(); j++) {
bool v = result.read(sp::Vec<int, 2>{i, j});
out << "Result for regex " << i << " and string " << j << ": "
<< (v ? "matches" : "does not match") << "\n";
}
}
});
s.synchronize();
}
Template Parameters
RegexesThe regexes to be used by the kernels provided by this class. Each one should be an instantiation of sp::regex::Regex, such as sp::regex::Regex<STATIC_STRING("Ca+ts")>

Member Function Documentation

◆ allocateMatchResultBuffer()

template<typename... Regexes>
template<typename IntT >
auto sp::regex::GPURegexMatcher< Regexes >::allocateMatchResultBuffer ( IntT  numStrings) const

Allocate and return a NomadicTensor, the device view of which is suitable for use as the out parameter of match() when applied to numStrings-many strings.

By calling sp::NomadicTensor::hostTensor(), you can conveniently copy the buffer back to the host - or you can pass the device buffer to further GPU kernels, enqueued after the regex match operation.

See also
NomadicTensor

◆ allocateSearchResultBuffer()

template<typename... Regexes>
auto sp::regex::GPURegexMatcher< Regexes >::allocateSearchResultBuffer ( int  numStrings,
int  maxMatches = 1024,
int  maxCaptureGroups = 2048,
int  maxChars = 100000000 
) const

Allocate memory for storing search results.

Pass the returned object to search(). These objects can be reused efficiently for multiple search operations, but care must be taken to ensure you don't create a race condition when doing so (since the search API is asynchronous). You may find it helpful to use Stream::launchHostFunc() to enqueue host-side work that uses the results on the same stream as the search operations.

Since GPU memory allocation is very expensive, and in some cases cannot be done in parallel with other use of the GPU, it is definitely worth reusing your result buffers.

Parameters
numStringsThe maximum number of strings that can be involved in a search operation that uses the returned result buffer.
maxMatchesThe total number of matches (across all strings) that can be stored in the returned object.
maxCaptureGroupsThe total number of capture groups (across all strings/matches) that can be stored in the returned object.
maxCharsAn upper bound on the total length of all strings using this result buffer.

Allocating a small object is cheaper. Currently, using a result object in a search operation that produces more results than it has capacity for will result in memory corruption. We'll probably make it stop doing that in the near future ;).

◆ isCompleteMatch()

template<typename... Regexes>
template<typename StringsType , typename OutType >
__host__ void sp::regex::GPURegexMatcher< Regexes >::isCompleteMatch ( StringsType  strings,
OutType  out,
Stream stream 
) const

Determine which of the regexes match which of some input strings.

This is a batched version of sp::regex::Regex::isCompleteMatch(), applying multiple regexes to multiple strings, in parallel, on the GPU.

Named Requirements
Parameters
stringsA source of device-resident sp::StringViews to process.
outA 2D, device-resident TensorLike to hold the outputs. The result for regex i applied to string j is written to out[i][j]. Commonly, this will be a device view into a NomadicTensor that you can later conveniently copy back to the host. Such a NomadicTensor can be generated by allocateMatchResultBuffer().
streamThe stream on which to queue the GPU kernel.
See also
Regex
NomadicTensor
TensorLike

◆ search()

template<typename... Regexes>
template<int PrefixFilterDepth = 6, typename DeviceStringsType >
__host__ auto sp::regex::GPURegexMatcher< Regexes >::search ( Stream stream,
Stream stream2,
impl::SearchResults< Regexes... > &  out,
const DeviceStringsType &  gpuStringViews 
) const

Find all the matches of the regexes at any offset into the given strings.

Parameters
streamThe stream to enqueue the GPU kernel on.
stream2Another stream that some of the work will be enqueued on. The first stream will be made to await on this stream, so you can regard this function as synchronous with respect to stream.
outAn output object allocated with allocateSearchResultBuffer() that will be populated with the result. resetForDevice() is called on the object to discard any existing data.
gpuStringViewsA source of device-resident views into device-resident strings to process.

◆ search2gpu()

template<typename... Regexes>
template<int PrefixFilterDepth = 6, typename DeviceStringsType >
__host__ auto sp::regex::GPURegexMatcher< Regexes >::search2gpu ( Stream stream,
Stream stream2,
impl::SearchResults< Regexes... > &  out,
const DeviceStringsType &  gpuStringViews 
) const

Like search(), but does not copy results to the GPU.