Compute.jaccard_similarity()#
#Compute.jaccard_similarity(node1: Producer, node2: Producer) -> Expression
This algorithm measures the Jaccard similarity between two nodes in a graph.
For unweighted graphs, it measures the similarity between two nodes based on how many neighbors (or out-neighbors if the graph is directed) they share.
Values range from 0.0
to 1.0
, inclusive, with 1.0
indicating that the nodes have identical neighborhoods and 0.0
indicating no meaningful relationship.
For weighted graphs, it measures the similarity between two nodes as the ratio of
the sums of the minimum and maximum edge weights connecting them.
Values range from 0.0
to 1.0
, inclusive, with higher values indicating greater similarity.
If all weights are 1.0 it degenerates to the unweighted case.
In both cases, pairs of nodes with a similarity of 0.0
, indicating no meaningful relationship,
are excluded from results for improved performance.
Must be called in a rule or query context.
Supported Graph Types#
Graph Type | Supported | Notes |
---|---|---|
Directed | Yes | Based on out-neighbors. |
Undirected | Yes | |
Weighted | Yes | |
Unweighted | Yes |
Parameters#
Returns#
Returns an Expression object that produces the Jaccard similarity between the two nodes as a floating-point value, calculated by the following formula:
#Jaccard similarity = (number of shared neighbors) / (total number of unique neighbors)
Example (Unweighted Graphs)#
Use .jaccard_similarity()
to compute the Jaccard similarity between two nodes in a graph.
You access the .jaccard_similarity()
method from a Graph
object’s
.compute
attribute:
#import relationalai as rai
from relationalai.std import alias
from relationalai.std.graphs import Graph
# Create a model named "socialNetwork" with a Person type.
model = rai.Model("socialNetwork")
Person = model.Type("Person")
# Add some people to the model and connect them with a multi-valued `friend` property.
with model.rule():
alice = Person.add(name="Alice")
bob = Person.add(name="Bob")
carol = Person.add(name="Carol")
alice.friends.add(bob)
bob.friends.add(carol)
# Create an undirected graph with Person nodes and edges between friends.
# This graph has two edges: one between Alice and Bob, and one between Bob and Carol.
graph = Graph(model, undirected=True)
graph.Node.extend(Person)
graph.Edge.extend(Person.friends)
with model.query() as select:
# Get pairs of people.
person1, person2 = Person(), Person()
# Compute the Jaccard similarity between each pair of people.
similarity = graph.compute.jaccard_similarity(person1, person2)
# Select each person's name and their similarity value.
response = select(person1.name, person2.name, alias(similarity, "jaccard_similarity"))
print(response.results)
# Output:
# name name2 jaccard_similarity
# 0 Alice Alice 1.0
# 1 Alice Carol 1.0
# 2 Bob Bob 1.0
# 3 Carol Alice 1.0
# 4 Carol Carol 1.0
There is no row for Alice and Bob in the preceding query’s results.
That’s because Alice and Bob have a Jaccard similarity of 0.0
.
Pairs of nodes with zero similarity, indicating no meaningful similarity, are often excluded from analyses.
Consequently, we filter out these pairs to improve performance.
If node1
or node2
is not a node in the graph, no exception is raised.
Instead, that object is filtered from the rule or query:
## Add a Company type to the model.
Company = model.Type("Company")
# Add some companies to the model.
with model.rule():
apple = Company.add(name="Apple")
google = Company.add(name="Google")
# Create the union of the Person and Company types.
PersonOrCompany = Person | Company
with model.query() as select:
# Get all person and company objects.
obj1, obj2 = PersonOrCompany(), PersonOrCompany()
obj1 < obj2 # Ensure pairs are unique. Compares internal object IDs.
# Compute the Jaccard similarity between each pair of objects.
# Objects that are not nodes in the graph are filtered out of the results.
similarity = graph.compute.jaccard_similarity(obj1, obj2)
response = select(obj1.name, obj2.name, alias(similarity, "jaccard_similarity"))
# Only rows for people are returned, since companies are not nodes in the graph.
# Also note that unlike in the preceding case, only distinct pairs of people are returned. We ensure this by requiring obj1 < obj2,
# where obj1 and obj2 represent the internal IDs of the individuals, and the IDs are ordered numerically.
print(response.results)
# Output:
# name name2 jaccard_similarity
# 0 Carol Alice 1.0
Example (Weighted Graphs)#
Use .jaccard_similarity()
to compute the weighted Jaccard similarity between two nodes in a graph.
#import relationalai as rai
from relationalai.std import alias
from relationalai.std.graphs import Graph
# Create a model named "socialNetwork" with Person and Friendship types.
model = rai.Model("socialNetwork")
Person = model.Type("Person")
Friendship = model.Type("Friendship")
# Add some people to the model and connect them with friendships.
with model.rule():
alice = Person.add(name="Alice")
bob = Person.add(name="Bob")
carol = Person.add(name="Carol")
Friendship.add(person1=alice, person2=bob, strength=20)
Friendship.add(person1=bob, person2=carol, strength=10)
Friendship.add(person1=alice, person2=carol, strength=10)
# Create a weighted, undirected graph with Person nodes and edges between friends.
# This graph has two edges: one between Alice and Bob, and one between Bob and Carol.
# The edges are weighted by the strength of each friendship.
graph = Graph(model, undirected=True, weighted=True)
graph.Node.extend(Person)
with model.rule():
friendship = Friendship()
graph.Edge.add(friendship.person1, friendship.person2, weight=friendship.strength)
# Compute the weighted Jaccard similarity between each pair of people in the graph.
with model.query() as select:
person1, person2 = Person(), Person()
similarity = graph.compute.jaccard_similarity(person1, person2)
response = select(person1.name, person2.name, alias(similarity, "jaccard_similarity"))
print(response.results)
# Output:
# name name2 jaccard_similarity
# 0 Alice Alice 1.00
# 1 Alice Bob 0.20
# 2 Alice Carol 0.25
# 3 Bob Alice 0.20
# 4 Bob Bob 1.00
# 5 Bob Carol 0.25
# 6 Carol Alice 0.25
# 7 Carol Bob 0.25
# 8 Carol Carol 1.00