Hossein Keshavarz, Yasmeen Hany, Pigi Kouki
11 April 2023
5 min read
In our previous blog post, we explained how to model traditional neighborhood-based recommender systems in Rel. In what follows, we focus on modeling graph-based recommender systems.
These models expand the notion of direct neighborhoods (used in the neighborhood-based models) to longer paths between users and items. This way, the graph-based models provide a broader interpretation of neighborhood.
We consider two simple, yet effective, algorithms: P3α and RP3β. The recommendation in these methods is based on a three hop random walk (RW) from users to items and vice-versa.
In P3α and RP3β, the similarity between distinct items are computed based on the aggregation of a score function over paths of length two between them. In the majority of real-world use cases, the user-item graph is extremely sparse. For example, there are only around 7M edges between 150K items and 1.99M users in the user-reviews dataset released by Yelp.
A framework for fast path aggregation over sparse graphs is therefore critical for the implementation of graph-based models such as P3α and RP3β. RAI's declarative modeling language Rel is a great environment for implementing graph-based recommender systems, because:
As we will show, graph-based methods can achieve higher precision and recall in the Movielens 100K dataset, while at the same time being twice as fast compared to the user- and item-based models.
Note that both P3α and RP3β methods fall into the category of collaborative filtering methods. In other words, these algorithms only rely on connections between users and items (i.e., who rates what) and they completely ignore the content of items or user-level information. Therefore, they can be modeled in Rel by slightly revising the neighborhood-based models.
In fact, apart from the subtle differences in the computation of item-item similarities and scoring user-item pairs, the building blocks of the graph-based recommender systems remain intact. The image below exhibits the architecture of the graph-based recommender system.
The pipeline of the graph-based recommender system.
The behavior of a Random Walk (RW) in a graph is described by the probability of moving between nodes, also known as transition probabilities. In the absence of content or user-level information, recommender systems usually deal with undirected bipartite user-item graphs.
There are two possible moves in these bipartite graphs:
In order to avoid unnecessary complexity, we focus on a simple RW model. In this scenario, the RW treats all the users (and items) the same way. So:
Probability of jumps for a simple RW in the user-item graph.
Next, let’s implement RW jump probabilities from a user to an item in Rel (we use the same idea to model the probability of moving from an item to a user).
def users(u) = Data(u, _)
def items(i) = Data(_, i)
def n_ratings_per_user(u, d) = count[i: Data(u, i)](d)
// Probability of moving from user u to item i
def Pui(u, i, p) =
Data(u, i) and
(p = 1/d) and n_ratings_per_user(u, d)
from d
Although here we only focus on binary interactions between users and items, the transition probabilities can be easily generalized to non-binary ratings (such as 1-5 ratings).
Despite some subtle differences, both P3α and RP3β algorithms take a similar approach to defining the item-item similarities. The core idea is that the similarity between two items is determined by the probability of moving from one item to the other in a path of length two.
For a compact mathematical formulation of item-item similarities, we need to define an important term: Uij.
For items i and j, Uij represents all users who rated both items. As shown in the figure, any user in Uij is the middle node in a path of length two between i and j.
Visual illustration of a user node in Uij.
P3α computes the similarity between two items i and j using the following formula:
where α is a non-negative hyperparameter, tuned on a validation set.
In fact, Sij is a certain type of aggregation over all paths of length two between i and j. Particularly for α = 1, Sij is the same as the probability of a two-hop jump from i to j.
RP3β adjusts P3α similarities by dividing by the item popularities (denoted by dj ) raised to the power of β. Strictly speaking, RP3β item-item similarities are defined by
The popularity of item j is defined as its degree in the user-item graph. Bringing dj into Sij reduces similarity between popular item j (the large value of dj) and other items, which leads to more diverse recommendations in many benchmark datasets.
We can model the complete process of defining Sij terms in Rel.
def n_ratings_per_item(i, d) = count[u: Data(u, i)](d)
// P3alpha item-item similarities (S_ij)
def p3alpha_item_item_sim[i, j](sim) =
Alpha(a) and
items(i) and items(j) and
(i != j) and
(sim = w1*w2) and
power[n_ratings_per_item[i], -a](w1) and
sum[(u, s): Data(u, i) and Data(u, j) and
power[n_ratings_per_user[u], -a](s)](w2)
from w1, w2, a
// computing RP3beta similarities if beta != 0
def item_item_sim[i, j](sim) =
if empty[Beta] then
p3alpha_item_item_sim[i, j](sim)
else
Beta(b) and power[n_ratings_per_item[j], -b](f) and
p3alpha_item_item_sim[i, j](intermediate_sim) and
sim = intermediate_sim*f
from b, f, intermediate_sim
end
We don't need to model P3α and RP3β separately in Rel, as P3α is an special case of RP3β (for β ≠ 0). All we need is a conditional statement to check for 0. As β is a relation in Rel, 0 means that β isn’t an empty set. So using empty[Beta]
does the trick.
In many practical applications, the user-item graph is sparse. However, the sparsity of the graph doesn’t necessarily translate to a sparse similarity matrix. In fact, it’s possible to have at least one path of length two between the majority of distinct items i.e., Sij ≠ 0 for many (i,j) pairs.
Working with a dense similarity matrix has two possible drawbacks:
These possible issues can be addressed by post-processing the item-item similarities. Sparsification and row normalization are two common post-processing steps to mitigate the above challenges in graph-based algorithms (e.g., Dacrema et al., 2021).
//STEP 1. Sparsifying the rows of S_ij
def item_item_sim_row_sp(i, j, sc) =
Neighborhood_size(n) and
topk[{(l, s): item_item_sim(i, l, s)}, n](_, j, sc)
from n
//STEP 2. Normalizing the rows of S_ij
def row_sum(i, ts) = sum[l,v:item_item_sim_row_sp(i, l, v)](ts)
def item_item_sim_nrm(i, j, ns) =
(ns = s/ts) and
item_item_sim_row_sp(i, j, s) and row_sum(i, ts)
from ts, s
//STEP 3. Sparsifying the columns of S_ij
def Sij(i, j, sc) =
Neighborhood_size(n) and
items(j) and
topk[{(l, s): item_item_sim_nrm(l, j, s)}, n](_, i, sc)
from n
Similar to the neighborhood-based methods, the next step after computing item-item similarities is to score all (u, i) pairs using the computed similarity matrix.
The scoring is based on summing the score of all paths of length three between u and i. Given the user-item scores, we follow the same strategy as in the neighborhood-based models for ranking and recommendation steps.
Any path of length three between u and i is comprised of two components:
So, aggregation over paths of length three can be calculated as follows:
Here is our implementation of the scoring step in Rel.
def P3ui[u, i](v) =
Alpha(a) and
users(u) and not Data(u, i) and
sum[(j, s): (s = s_uj*s_ji) and
Pui(u, j, p_uj) and power[p_uj, a](s_uj) and
Sij(j, i, s_ji)
from p_uj, s_uj, s_ji](v)
from a
With the RP3β module to hand, we can query the database to get the recommended items and evaluation scores on our test data.
def alpha = 0.8593
def neighborhood_size = 324
def beta = 0.6574
def k = 10
def test_users(u) = test_data(u, _)
def RecSys =RP3beta[train_data, alpha, neighborhood_size, beta]
// For any user u, E[u][i] is the predicted score for item i
def E[u in test_users](i, s) =
evaluation_data(u, i) and
RecSys[:P3ui](u, i, s)
def recommendations(u, i) = recommend_top_k[E, k](u, i)
def metrics[:precision](m) =
mean[u in test_users, p: precision[recommendations[u], test_data[u]](p) ](m)
def metrics[:recall](m) =
mean[u in test_users, r: recall[recommendations[u], test_data[u]](r) ](m)
def output = metrics
In our experiments, for k=10, P3α and RP3β achieve precisions of 32.8% and 34.2%, respectively. Furthermore, the recall scores are 20.9% and 21.7%.
The evaluation scores show that RP3β slightly outperforms P3α and neighborhood-based methods (the precision was 33.6% and the recall was 21.2%) on the Movielens100K dataset.
As in our previous work, the entire pipeline is implemented in our Relational Knowledge Graph System (RKGS), eliminating the need to transfer data out of the system at any point.
This approach simplifies the recommendation process, as generating personalized recommendations for a particular user simply involves querying the database to retrieve the top K recommendations.
On top of that, implementing graph-based recommender systems in Rel is highly beneficial as the language can efficiently aggregate over paths on large and sparse graphs, while providing a compact and readable implementation that is easy to maintain.
In our final post in this series, we will describe how to use Rel to provide content-based and hybrid recommender systems that leverage additional information signals, such as the genre of a movie.
RelationalAI's declarative modeling language Rel can be a powerful tool for machine learning data preprocessing. It is concise, readable, and facilitates testing and debugging in development. Rel can significantly simplify your machine learning data pipeline.
Read MoreIn this series of blog posts, we show how to implement neighborhood-based, graph-based, content-based, and hybrid recommender systems using RelationalAI’s declarative modeling language, Rel. The implementations of these algorithms demonstrate the efficiency of our Relational Knowledge Graph System (RKGS) for aggregating over paths on large and sparse graphs, computing similarities using our graph analytics library, and building compact, easy to read models, without needing to transfer data outside of our system.
Read MoreRecommender systems are one of the most successful and widely used applications of machine learning. Their use cases span a range of industry sectors such as e-commerce, entertainment, and social media. In this post, we focus on a fundamental and effective classical approach to recommender systems, which is neighborhood-based.
Read More