Label Propagation
The label_propagation
function returns a community
, to which the node belongs, which is determined through assigning labels
.
In the analysis of social networks, for instance, it can often be useful to identify subgroups of connected people. These subgroups can be detected by using an algorithm such as Label Propagation. It identifies communities in a graph in iterative steps. Nodes are initialized with unique labels and, at each iteration of the algorithm, nodes adopt the most frequently occurring label among their neighbors.
Let us look at a very small example social graph and use LP to identify which people belong to which subgroups.
Our graph will model a social network in which people have friends and their friendship can be of different strength. We will first apply the algorithm to unweighted undirected graph, then add friendship strength as weights to the edges and look at the difference in the results.
First, we need to import the relationalai
library and define our model, which we call MyLPGraph
. We also create two types called Person
and Friendship
.
%pip install relationalai
import relationalai as rai
from relationalai.std import alias
from relationalai.std.graphs import Graph
model = rai.Model("MyLPGraph")
# Graph will hold these types:
# Represent nodes
Person = model.Type("Person")
# Represent connections between nodes
Friendship = model.Type("Friendship")
Let's add some data to our model
We first create a dictionary of people names, who are friends with each other, and a number representing the strength of their friendship. Then we iterate over the dictionary to create objects of type Person
with a name
property. We also create Friendship
instances with strength
property, representing connection between two people. We will use it to create weighted edges between nodes in our graph.
data = {
("Mary", "Bob"): 1.0,
("Mary", "Jessy"): 0.5,
("Mary", "Erica"): 0.1,
("Mary", "Tom"): 0.1,
("Bob", "Frank"): 0.1,
("Jessy", "Erica"): 1.0,
("Erica", "Tom"): 1.0,
("Tom", "Jessy"): 1.0,
("Frank", "Anna"): 1.0,
("Frank", "Mark"): 1.0,
("Frank", "Peter"): 0.5,
("Anna", "Peter"): 0.1,
("Anna", "Mark"): 1.0,
("Peter", "Mark"): 0.1,
}
with model.rule(dynamic = True):
for (friends, strength) in data.items():
name, friend_name = friends
person = Person.add(name = name)
friend = Person.add(name = friend_name)
Friendship.add(person = person, friend = friend).set(strength = strength)
Creating the graph
Let's start by creating a graph
with Node
and Edge
collections. We add all Person
instances as nodes, and assign the name
property so that we can use them in our queries and for visualization purposes. The Friendship
instances are then used to form the edges in our graph.
# Create graph
graph = Graph(model, undirected = True)
Node, Edge = graph.Node, graph.Edge
# add all Person instances as Nodes and assign `name` property (for displaying)
Node.extend(Person, name = Person.name)
# add all Friendship instances as Edges from `person` to `friend` Nodes
with model.rule():
f = Friendship()
Edge.add(f.person, f.friend)
Running the algorithm
Let's add a rule that calculates for every node of the graph the community that it belongs_to
. To derive this value for each node using the label propagation algorithm, we can simply use Graph(model).compute.label_propagation()
.
# run label propagation algorithm on the Graph,
# for every Person node assign the community it belongs to
with model.rule():
p = Person()
community = graph.compute.label_propagation(p)
p.set(belongs_to = community)
Node(p).set(community_id = community)
Querying the Graph
Graph Nodes, Edges and their properties are queried using model.query()
context and graph.Node()
or graph.Edge()
types.
Alternatively, the entire graph representation can be fetched using Graph(model).fetch()
. This returns a dictionary with two keys, nodes
and edges
, that represents the entire graph.
# query all Edges with `name` and `community_id` properties of the Nodes they connect
with model.query() as select:
e = Edge()
n1 = Node(e.from_)
n2 = Node(e.to)
response = select(alias(n1.name, 'from'), alias(n1.community_id, 'community (from)'), alias(n2.name, 'to'), alias(n2.community_id, 'community (to)'))
response
# fetch the entire graph dictionary (not recommended to be used for large graphs)
# graph_dict = graph.fetch()
from | community (from) | to | community (to) |
---|---|---|---|
Anna | 1 | Mark | 1 |
Anna | 1 | Peter | 1 |
Bob | 1 | Frank | 1 |
Erica | 2 | Tom | 2 |
Frank | 1 | Anna | 1 |
Frank | 1 | Mark | 1 |
Frank | 1 | Peter | 1 |
Jessy | 2 | Erica | 2 |
Mary | 2 | Bob | 1 |
Mary | 2 | Erica | 2 |
Mary | 2 | Jessy | 2 |
Mary | 2 | Tom | 2 |
Peter | 1 | Mark | 1 |
Tom | 2 | Jessy | 2 |
How many communities were found? And to which communities do people belong?
# For each Person, fetch the name and the community it belongs to
with model.query() as select:
p = Person()
response = select(p.name, p.belongs_to)
response
name | belongs_to |
---|---|
Anna | 1 |
Bob | 1 |
Erica | 2 |
Frank | 1 |
Jessy | 2 |
Mark | 1 |
Mary | 2 |
Peter | 1 |
Tom | 2 |
# Group the response from the above query by the belongs_to property to separate the communities
communities = response.results.groupby("belongs_to").name.apply(list)
# Print the number of communities
count_communities = len(communities)
print(f"Number of communities: {count_communities}")
# Print communities and people that belong to them
for i, c in enumerate(communities):
print(f"Community N{i+1}: {c}")
Number of communities: 2 Community N1: ['Anna', 'Bob', 'Frank', 'Mark', 'Peter'] Community N2: ['Erica', 'Jessy', 'Mary', 'Tom']
Let's find out who is in the same community as Frank
person_name = "Frank"
with model.query() as select:
p1 = Person(name = person_name)
p2 = Person()
# We want to skip the person itself
p1 != p2
p1.belongs_to == p2.belongs_to
response = select(p2.name)
response
name |
---|
Anna |
Bob |
Mark |
Peter |
Visualizing the results
Let's visualize our graph, to better understand the results. We use Graph(model).visualize()
to visualize our graph. We define a color for each of the communities.
graph.visualize(three = False, style = {
"node": {
"color": lambda n : ['blue', 'pink'][ list(communities.keys()).index(n['community_id']) % 2 ],
"label": lambda n : n['name'],
"hover": lambda n: n['community_id'],
"size": 30
},
"edge": {
"color": "green"
}
}).display(inline = True)
Adding weights to the graph edges
As a final step, let's see how adding weights to the graph edges affects the result of the algorithm.
Let's create a new graph with the same Nodes and Edges, but now using Friendship.strength
property as a weight of an edge.
# Create weighted graph
weighted_graph = Graph(model, undirected = True, weighted = True)
# add all same Person instances as Nodes and assign `name` property
weighted_graph.Node.extend(Person, name = Person.name)
# add all Friendship instances as Edges from `person` to `friend` Nodes with weight property
with model.rule():
f = Friendship()
weighted_graph.Edge.add(f.person, f.friend, weight = f.strength)
# run label propagation algorithm on the new Graph
with model.rule():
n = weighted_graph.Node()
n.set(community_id = weighted_graph.compute.label_propagation(n))
How many communities were found now?
# query weighted graph nodes names and communities they were assigned to
with model.query() as select:
n = weighted_graph.Node()
response = select(n.name, n.community_id)
# group names by community_id
communities = response.results.groupby("community_id").name.apply(list)
# print communities and names of people that belong to them
for i, c in enumerate(communities):
print(f"Community N{i+1}: {c}")
Community N1: ['Anna', 'Frank', 'Mark', 'Peter'] Community N2: ['Erica', 'Jessy', 'Tom'] Community N3: ['Bob', 'Mary']
Visualizing the weighted graph
Let's make the size of the Edges proportional to the weight. Additionally, we'll define more colors since we now have more communities.
weighted_graph.visualize(three = False, style = {
"node": {
"color": lambda n : ['blue', 'pink', 'orange'][ list(communities.keys()).index(n['community_id']) % 3 ],
"label": lambda n : n['name'],
"hover": lambda n: n['community_id'],
"size": 30
},
"edge": {
"color": "green",
"size": lambda e: e['weight'] * 3.5
}
}).display(inline = True)