levenshtein()#
relationalai.std.strings
#levenshtein(string1: str|Producer, string2: str|Producer) -> Expression
Calculates the Levenshtein distance between two strings, which measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into the other. If string1
or string2
is a Producer
, then levenshtein()
filters out non-string values. Must be called in a rule or query context.
Parameters#
Name | Type | Description |
---|---|---|
string1 | Producer or Python str object | The first string. |
string2 | Producer or Python str object | The second string. |
Returns#
An Expression
object.
Example#
Use levenshtein()
to calculate the distance between pairs of strings:
#import relationalai as rai
from relationalai.std import aggregates, strings
# =====
# SETUP
# =====
model = rai.Model("MyModel")
Person = model.Type("Person")
with model.rule():
Person.add(id=1).set(name="Alice")
Person.add(id=2).set(name="Alicia")
Person.add(id=3).set(name="Bob")
Person.add(id=4).set(name=-1) # Non-string name
# =======
# EXAMPLE
# =======
# Set a multi-valued most_similar_to property on each person to other people
# whose names have the smallest Levenshtein distance from their own.
with model.rule():
person, other = Person(), Person()
person != other
# Calculate the Levenshtein distance between the names of each pair of people.
dist = strings.levenshtein(person.name, other.name)
# Filter to others with smallest distance per person.
aggregates.bottom(1, dist, per=[person])
# Set the most_similar_to property to the other people with the smallest distance.
person.most_similar_to.extend([other])
# Since levenshtein() filters out non-string values, the most_similar_to property
# is not set for the person with id=4.
with model.query() as select:
person = Person()
response = select(
person.id,
person.name,
person.most_similar_to.id,
person.most_similar_to.name
)
print(response.results)
# id name id2 name2
# 0 1 Alice 2 Alicia
# 1 2 Alicia 1 Alice
# 2 3 Bob 1 Alice