Murali Pusala
19 January 2022
less than a minute read
The Rel library has included basic regular expression support for a while. For example, regex_match tests whether a string matches a regular expression.
regex_match("^.*@.*$", "some@example.com")
The relation string_replace
also supports regular expressions, for example string_trim
is defined using a regular expression:
def string_trim[s] = string_replace[
s, regex_compile["^\\s+|\\s+$"], ""]
Until now, it was not yet possible to extract matching substrings using regular expressions. We are excited to announce that we have now added support for this.
The relation regex_match_all
finds all substrings in a string that match the regular expression. The relation includes the matched substring as well as corresponding offsets.
regex_match_all["(cat|dog)s?", "cats are not dogs"]
Relation:
1 | "cats" |
2 | "dogs" |
We also introduce the capture_group_by_index
relation to capture a substring that matches groups in a regular expression. This relation searches for matches in an input string starting from a given offset
.
Each group in the regular expression is automatically given a unique number starting with 1.
def email = "john.doe@example.com"
def pattern = "^(.*)@(.*).com$"
def output = email, capture_group_by_index[pattern, email, 1]
Relation:
"john.doe@example.com" | 1 | "john.doe" |
"john.doe@example.com" | 2 | "example" |
Along with numerical index, Rel supports regular expressions with named capture groups. The capture_group_by_name
relation includes the captured substring for the corresponding group name.
def my_string = "Meeting is at 11:45 AM"
def pattern = "(?<hour>\\d+):(?<minute>\\d+)"
def output = capture_group_by_name[pattern, my_string, 1]
Relation:
"hour" | "11" |
"minute" | "45" |
The regular expression capabilities are implemented using the foreign function interfaces, but these relations are designed to be used as any relation. For example, when a specific capture group is needed, it can be specified upfront, as illustrated in this example:
def my_group = capture_group_by_name[
"^.*@(?<domain>.*)\\.com$", "foo@example.com", 1]
def output = my_group["domain"]
Relation: "example"
With the new regular expression features we expect to cover more of the common data engineering use-cases. We’re excited to learn about how you are using Rel – please let us know about any future features you’d like to see.
We are excited to announce worksheets, a new interface for submitting Rel queries. Worksheets allow you to develop blocks of Rel code and run them against a database. They can be shared with other users using their URLs.
Read MoreWe are excited to announce the support of varargs in Rel. You can use varargs to write more general code that works for multiple arities. Varargs can be useful when writing generic relations for common utilities.
Read MoreValue types help distinguish between different kinds of values, even though the underlying representation may be identical. Value types can be used to define other value types.
Read More