Regular Expression Matching and Capture Groups in Rel
The Rel library has included basic regular expression support for a while. For example, regex_match tests whether a string matches a regular expression.
regex_match("^.*@.*$", "some@example.com")
The relation string_replace
also supports regular expressions, for example string_trim
is defined using a regular expression:
def string_trim[s] = string_replace[
s, regex_compile["^\\s+|\\s+$"], ""]
Until now, it was not yet possible to extract matching substrings using regular expressions. We are excited to announce that we have now added support for this.
The relation regex_match_all
finds all substrings in a string that match the regular expression. The relation includes the matched substring as well as corresponding offsets.
def output = regex_match_all["(cat|dog)s?", "cats are not dogs"]
Relation:
We also introduce the capture_group_by_index
relation to capture a substring that matches groups in a regular expression. This relation searches for matches in an input string starting from a given offset
.
Each group in the regular expression is automatically given a unique number starting with 1.
def email = "john.doe@example.com"
def pattern = "^(.*)@(.*).com$"
def output = email, capture_group_by_index[pattern, email, 1]
Relation:
Along with numerical index, Rel supports regular expressions with named capture groups. The capture_group_by_name
relation includes the captured substring for the corresponding group name.
def my_string = "Meeting is at 11:45 AM"
def pattern = "(?<hour>\\d+):(?<minute>\\d+)"
def output = capture_group_by_name[pattern, my_string, 1]
Relation:
The regular expression capabilities are implemented using the foreign function interfaces, but these relations are designed to be used as any relation. For example, when a specific capture group is needed, it can be specified upfront, as illustrated in this example:
def my_group = capture_group_by_name[
"^.*@(?<domain>.*)\\.com$", "foo@example.com", 1]
def output = my_group["domain"]
Relation:
With the new regular expression features we expect to cover more of the common data engineering use-cases. We’re excited to learn about how you are using Rel --- please let us know about any future features you’d like to see.