weighted_jaccard_similarity()#

relationalai.std.graphs.Compute
#weighted_jaccard_similarity(node1: Producer, node2: Producer) -> Expression

Compute the weighted Jaccard similarity between two nodes in a graph. Weighted Jaccard similarity measures the similarity between two nodes by considering the ratio of the sums of the minimum and maximum edge weights connecting them. Values range from 0.0 to 1.0, inclusive, with higher values indicating greater similarity. Pairs of nodes with a similarity of 0.0, indicating no meaningful relationship, are automatically excluded from results for improved performance. Must be called in a rule or query context.

Supported Graph Types#

Graph TypeSupportedNotes
DirectedYes
UndirectedYes
WeightedYes
UnweightedYesEdge weights default to 1.0.

Parameters#

NameTypeDescription
node1ProducerA node in the graph.
node2ProducerA node in the graph.

Returns#

Returns an Expression object that produces the weighted Jaccard similarity between the two nodes as a floating-point value, calculated by the following formula:

#Weighted Jaccard similarity = (sum of minimum edge weights between u and v) / (sum of maximum edge weights between u and v)

Example#

Use .weighted_jaccard_similarity() to compute the weighted Jaccard similarity between two nodes in a graph. You access the .weighted_jaccard_similarity() method from a Graph object’s .compute attribute:

#import relationalai as rai
from relationalai.std import alias
from relationalai.std.graphs import Graph

# Create a model named "socialNetwork" with Person and Friendship types.
model = rai.Model("socialNetwork")
Person = model.Type("Person")
Friendship = model.Type("Friendship")

# Add some people to the model and connect them with friendships.
with model.rule():
    alice = Person.add(name="Alice")
    bob = Person.add(name="Bob")
    carol = Person.add(name="Carol")
    Friendship.add(person1=alice, person2=bob, strength=100)
    Friendship.add(person1=bob, person2=carol, strength=10)

# Create a weighted, undirected graph with Person nodes and edges between friends.
# This graph has two edges: one between Alice and Bob, and one between Bob and Carol.
# The edges are weighted by the strength of each friendship.
graph = Graph(model, undirected=True, weighted=True)
graph.Node.extend(Person)
with model.rule():
    friendship = Friendship()
    graph.Edge.add(friendship.person1, friendship.person2, weight=friendship.strength)

# Compute the weighted Jaccard similarity between each pair of people in the graph.
with model.query() as select:
    person1, person2 = Person(), Person()
    similarity = graph.compute.weighted_jaccard_similarity(person1, person2)
    response = select(person1.name, person2.name, alias(similarity, "weighted_jaccard_similarity"))

print(response.results)
# Output:
#     name  name2  weighted_jaccard_similarity
#     name  name2  weighted_jaccard_similarity
# 0  Alice  Alice                          1.0
# 1  Alice  Carol                          0.1
# 2    Bob    Bob                          1.0
# 3  Carol  Alice                          0.1
# 4  Carol  Carol                          1.0

There is no row for Alice and Bob in the preceding query’s results. That’s because Alice and Bob have a weighted Jaccard similarity of 0.0. Pairs of nodes with zero similarity, indicating no meaningful similarity, are often excluded from analyses. Consequently, we filter out these pairs to improve performance.

If node1 or node2 is not a node in the graph, no exception is raised. Instead, that object is filtered from the rule or query:

## Add a Company type to the model.
Company = model.Type("Company")

# Add some companies to the model.
with model.rule():
    apple = Company.add(name="Apple")
    google = Company.add(name="Google")

# Create the union of the Person and Company types.
PersonOrCompany = Person | Company

with model.query() as select:
    # Get all person and company objects.
    obj1, obj2 = PersonOrCompany(), PersonOrCompany()
    obj1 < obj2  # Ensure pairs are unique. Compares internal object IDs.
    # Compute the weighted Jaccard similarity between each pair of objects.
    # Objects that are not nodes in the graph are filtered.
    similarity = graph.compute.weighted_jaccard_similarity(obj1, obj2)
    response = select(obj1.name, obj2.name, alias(similarity, "weighted_jaccard_similarity"))

# Only rows for people are returned, since companies are not nodes in the graph.
print(response.results)
# Output:
#     name  name2  weighted_jaccard_similarity
# 1  Carol  Alice                          1.0

See Also#