Pagerank
The pagerank
function computes the pagerank value of a node in a graph.
PageRank measure the importance of a node based on its connections to other important nodes.
The algorithm was invented by Larry Page, co-founder of Google. Originally it was used to order search result and show the most important hits first. This concept revolutionized the whole field of search engines.
Let us look at a very small example graph and apply pagerank to identify the most important nodes.
First, we need to import the relationalai
library and define our model, which we call PageRank
.
We also create a type called Page
. In a labelled property graph model Page
would be translated into a node label.
%pip install relationalai
import relationalai as rai
from relationalai.std import aggregates
from relationalai.std.graphs import Graph
model = rai.Model("PageRank")
Page = model.Type("Page")
Let's add some data to our model
Our data model is very simplistic: it's web pages that link to other web pages. We define a dictionary of page names and the pages they are linked to. We then iterate over it to create Page
objects with name
property and connect them via links
property.
pages_links = {
"Startpage": ["News", "Products", "Imprint"],
"News": ["Startpage", "Products"],
"Products": ["Product1", "Product2", "Product3", "Startpage"],
"Product1": ["Products", "Product2"],
"Product2": ["Products", "Product3"],
"Product3": ["Products"],
"Imprint": ["Startpage"],
}
with model.rule(dynamic = True):
for (page_name, links) in pages_links.items():
page = Page.add(name = page_name)
for linked_page_name in links:
linked_page = Page.add(name = linked_page_name)
page.links.add(linked_page)
Creating the graph
Let's start by creating a graph
with Node
and Edge
collections. We add all Page
instances as nodes, and assign the name
property so that we can use them in our queries and for visualization purposes. The links
property is then used to form the edges in our graph.
# Create graph
graph = Graph(model)
# add all Page instances as Nodes, assign `name` property (for displaying)
graph.Node.extend(Page, name = Page.name)
# add all `links` property as Edges
graph.Edge.extend(Page.links)
Running the algorithm
Let's add a rule that calculates for every node of the graph the pagerank value. To derive this value for each node, we can simply use Graph(model).compute.pagerank()
.
Note that we could supply algorithm parameter like damping factor or maximum iterations. In this case we will rely on default values.
with model.rule():
p = Page()
pagerank = graph.compute.pagerank(p)
p.set(pagerank = pagerank)
graph.Node(p).set(pagerank = pagerank)
Querying the Graph
Graph Nodes, Edges and their properties are queried using model.query()
context and graph.Node()
or graph.Edge()
types.
Alternatively, the entire graph representation can be fetched using Graph(model).fetch()
. This returns a dictionary with two keys, nodes
and edges
, that represents the entire graph.
with model.query() as select:
p = graph.Node()
response = select(p.name, p.pagerank)
response
# fetch the entire graph dictionary (not recommended to be used for large graphs)
# graph_dict = graph.fetch()
name | pagerank |
---|---|
Imprint | 0.072831 |
News | 0.072831 |
Product1 | 0.088565 |
Product2 | 0.126205 |
Product3 | 0.142202 |
Products | 0.315934 |
Startpage | 0.181431 |
Ordering the pages by their pagerank
We can sort our results by applying aggregates.rank_desc
function to see the pages starting from the highest ranking one.
with model.query() as select:
p = Page()
rank = aggregates.rank_desc(p.pagerank)
response = select(rank, p.name, p.pagerank)
response
result | name | pagerank |
---|---|---|
1 | Products | 0.315934 |
2 | Startpage | 0.181431 |
3 | Product3 | 0.142202 |
4 | Product2 | 0.126205 |
5 | Product1 | 0.088565 |
6 | Imprint | 0.072831 |
6 | News | 0.072831 |
Using aggregation functions
Let's calculate the total of all pagerank values. Pagerank is normalized - so the sum over all values is 1.0.
with model.query() as select:
p = Page()
totalPagerank = aggregates.sum(p, p.pagerank)
response = select(totalPagerank)
response
result |
---|
1.0 |
Visualizing the results
As a final step, let's visualize our graph, to better understand the results. We use Graph(model).visualize()
to visualize our graph.
Note that we scale the size of the nodes to represent their pagerank. In our example the Products
page has the highest pagerank.
graph.visualize(three = False, style = {
"node": {
"label": lambda n : n['name'],
"color": "blue",
"size": lambda n: n["pagerank"] * 100
},
"edge": {
"color": "green",
}
}).display(inline = True)