Skip to content

Regular Expression Matching and Capture Groups in Rel

puzzle-pieces

The Rel library has included basic regular expression (opens in a new tab) support for a while. For example, regex_match (opens in a new tab) tests whether a string matches a regular expression.

regex_match("^.*@.*$", "some@example.com")

The relation string_replace also supports regular expressions, for example string_trim is defined using a regular expression:

def string_trim[s] = string_replace[
    s, regex_compile["^\\s+|\\s+$"], ""]

Until now, it was not yet possible to extract matching substrings using regular expressions. We are excited to announce that we have now added support for this.

The relation regex_match_all finds all substrings in a string that match the regular expression. The relation includes the matched substring as well as corresponding offsets.

// read query
 
def output = regex_match_all["(cat|dog)s?", "cats are not dogs"]

Relation:

We also introduce the capture_group_by_index relation to capture a substring that matches groups in a regular expression. This relation searches for matches in an input string starting from a given offset.

Each group in the regular expression is automatically given a unique number starting with 1.

// read query
 
def email = "john.doe@example.com"
def pattern = "^(.*)@(.*).com$"
 
def output = email, capture_group_by_index[pattern, email, 1]

Relation:

Along with numerical index, Rel supports regular expressions with named capture groups. The capture_group_by_name relation includes the captured substring for the corresponding group name.

// read query
 
def my_string = "Meeting is at 11:45 AM"
def pattern = "(?<hour>\\d+):(?<minute>\\d+)"
 
def output = capture_group_by_name[pattern, my_string, 1]

Relation:

The regular expression capabilities are implemented using the foreign function interfaces, but these relations are designed to be used as any relation. For example, when a specific capture group is needed, it can be specified upfront, as illustrated in this example:

// read query
 
def my_group = capture_group_by_name[
    "^.*@(?<domain>.*)\\.com$", "foo@example.com", 1]
 
def output = my_group["domain"]

Relation:

With the new regular expression features we expect to cover more of the common data engineering use-cases. We’re excited to learn about how you are using Rel — please let us know about any future features you’d like to see.

Get Started!

Start your journey with RelationalAI today! Sign up to receive our newsletter, invitations to exclusive events, and customer case studies.

The information you provide will be used in accordance with the terms of our Privacy Policy. By submitting this form, you consent to allow RelationalAI to store and process the personal information submitted above to provide you the content requested.