THIS POST IS ARCHIVED

Tuesday, April 11, 2023

Graph-based Recommender Systems in Rel - Part 2

In our previous blog post, we explained how to model traditional neighborhood-based recommender systems in Rel. In what follows, we focus on modeling graph-based recommender systems.

These models expand the notion of direct neighborhoods (used in the neighborhood-based models) to longer paths between users and items. This way, the graph-based models provide a broader interpretation of neighborhood.

We consider two simple, yet effective, algorithms: P³_α and RP³_β. The recommendation in these methods is based on a three hop random walk (RW) from users to items and vice-versa.

In P³_α and RP³_β, the similarity between distinct items are computed based on the aggregation of a score function over paths of length two between them. In the majority of real-world use cases, the user-item graph is extremely sparse. For example, there are only around 7M edges between 150K items and 1.99M users in the user-reviews dataset released by Yelp.

A framework for fast path aggregation over sparse graphs is therefore critical for the implementation of graph-based models such as P³_α and RP³_β. RAI's declarative modeling language Rel is a great environment for implementing graph-based recommender systems, because:

It efficiently aggregates over paths on large and sparse graphs.
It provides an easily maintainable model due to a compact and easy-to-read implementation.

As we will show, graph-based methods can achieve higher precision and recall in the Movielens 100K dataset, while at the same time being twice as fast compared to the user- and item-based models.

Pipeline Overview

Note that both P³_α and RP³_β methods fall into the category of collaborative filtering methods. In other words, these algorithms only rely on connections between users and items (i.e., who rates what) and they completely ignore the content of items or user-level information. Therefore, they can be modeled in Rel by slightly revising the neighborhood-based models.

In fact, apart from the subtle differences in the computation of item-item similarities and scoring user-item pairs, the building blocks of the graph-based recommender systems remain intact. The image below exhibits the architecture of the graph-based recommender system.

The pipeline of the graph-based recommender system.

Modeling the Random Walk Transition Matrix

The behavior of a Random Walk (RW) in a graph is described by the probability of moving between nodes, also known as transition probabilities. In the absence of content or user-level information, recommender systems usually deal with undirected bipartite user-item graphs.

There are two possible moves in these bipartite graphs:

Traveling from a user node to an item node.
Traveling from an item node to a user node.

In order to avoid unnecessary complexity, we focus on a simple RW model. In this scenario, the RW treats all the users (and items) the same way. So:

For a user u who rated d items I_u = \❴i1 , . . . , id\❵, moving from node u to i happens with probability 1/d for any item i∈I_u. In other words, P_ui = 1/d.
Similarly, if item i has been rated by d users U_i = \❴u1 , . . . , ud\❵, then the RW moves from i to any u∈U_i with probability 1/d. In other words, P_iu = 1/d.

Probability of jumps for a simple RW in the user-item graph.

Next, let's implement RW jump probabilities from a user to an item in Rel (we use the same idea to model the probability of moving from an item to a user).

def users(u) = Data(u, _)
def items(i) = Data(_, i)
def n_ratings_per_user(u, d) = count[i: Data(u, i)](d)

// Probability of moving from user u to item i

def Pui(u, i, p) =
    Data(u, i) and
    (p = 1/d) and n_ratings_per_user(u, d)
    from d

Although here we only focus on binary interactions between users and items, the transition probabilities can be easily generalized to non-binary ratings (such as 1-5 ratings).

Computing Item-Item Similarities

Despite some subtle differences, both P³_α and RP³_β algorithms take a similar approach to defining the item-item similarities. The core idea is that the similarity between two items is determined by the probability of moving from one item to the other in a path of length two.

For a compact mathematical formulation of item-item similarities, we need to define an important term: U_ij.

For items i and j, U_ij represents all users who rated both items. As shown in the figure, any user in U_ij is the middle node in a path of length two between i and j.

Visual illustration of a user node.

P³_α computes the similarity between two items i and j using the following formula:

where α is a non-negative hyperparameter, tuned on a validation set.

In fact, S_ij is a certain type of aggregation over all paths of length two between i and j. Particularly for α = 1, S_ij is the same as the probability of a two-hop jump from i to j.

RP³_β adjusts P³_α similarities by dividing by the item popularities (denoted by d_j ) raised to the power of β. Strictly speaking, RP³_β item-item similarities are defined by

The popularity of item j is defined as its degree in the user-item graph. Bringing d_j into S_ij reduces similarity between popular item j (the large value of d_j) and other items, which leads to more diverse recommendations in many benchmark datasets.

We can model the complete process of defining S_ij terms in Rel.

def n_ratings_per_item(i, d) = count[u: Data(u, i)](d)

// P3alpha item-item similarities (S_ij)

def p3alpha_item_item_sim[i, j](sim) =
    Alpha(a) and
    items(i) and items(j) and
    (i != j) and
    (sim = w1*w2) and
    power[n_ratings_per_item[i], -a](w1) and
    sum[(u, s): Data(u, i) and Data(u, j) and
        power[n_ratings_per_user[u], -a](s)](w2)
    from w1, w2, a

// computing RP3beta similarities if beta != 0

def item_item_sim[i, j](sim) =
    if empty[Beta] then
    p3alpha_item_item_sim[i, j](sim)
    else
    Beta(b) and power[n_ratings_per_item[j], -b](f) and
    p3alpha_item_item_sim[i, j](intermediate_sim) and
    sim = intermediate_sim*f
    from b, f, intermediate_sim
    end

We don't need to model P³_α and RP³_β separately in Rel, as P³_α is an special case of RP³_β (for β ≠ 0). All we need is a conditional statement to check for 0. As β is a relation in Rel, 0 means that β isn't an empty set. So using empty[Beta] does the trick.

Post-Processing Item-Item Similarities

In many practical applications, the user-item graph is sparse. However, the sparsity of the graph doesn't necessarily translate to a sparse similarity matrix. In fact, it's possible to have at least one path of length two between the majority of distinct items i.e., S_ij ≠ 0 for many (i,j) pairs.

Working with a dense similarity matrix has two possible drawbacks:

Heavy computation for recommending relevant items.
Less accurate recommendations (lower scores on the test data).

These possible issues can be addressed by post-processing the item-item similarities. Sparsification and row normalization are two common post-processing steps to mitigate the above challenges in graph-based algorithms (e.g., Dacrema et al., 2021).

Sparsification: Given a hyperparameter n, known as neighborhood size, we only keep n largest elements per row and column of the similarity matrix. Simply put, zero-out S_ij if it's too small.
Normalization: We normalize each row of the similarity matrix with its sum.

//STEP 1. Sparsifying the rows of S_ij

def item_item_sim_row_sp(i, j, sc) =
	Neighborhood_size(n) and
    topk[{(l, s): item_item_sim(i, l, s)}, n](_, j, sc)
    from n

//STEP 2. Normalizing the rows of S_ij

def row_sum(i, ts) = sum[l,v:item_item_sim_row_sp(i, l, v)](ts)

def item_item_sim_nrm(i, j, ns) =
    (ns = s/ts) and
    item_item_sim_row_sp(i, j, s) and row_sum(i, ts)
    from ts, s

//STEP 3. Sparsifying the columns of S_ij

def Sij(i, j, sc) =
    Neighborhood_size(n) and
    items(j) and
    topk[{(l, s): item_item_sim_nrm(l, j, s)}, n](_, i, sc)
    from n

Scoring

Similar to the neighborhood-based methods, the next step after computing item-item similarities is to score all (u, i) pairs using the computed similarity matrix.

The scoring is based on summing the score of all paths of length three between u and i. Given the user-item scores, we follow the same strategy as in the neighborhood-based models for ranking and recommendation steps.

Any path of length three between u and i is comprised of two components:

An edge between u and an item j (which has been rated by user u).
A path of length two between j and i.

So, aggregation over paths of length three can be calculated as follows:

Here is our implementation of the scoring step in Rel.

def P3ui[u, i](v) =
    Alpha(a) and
    users(u) and not Data(u, i) and
    sum[(j, s): (s = s_uj*s_ji) and
                Pui(u, j, p_uj) and power[p_uj, a](s_uj) and
                Sij(j, i, s_ji)
                from p_uj, s_uj, s_ji](v)
    from a

Putting It All together

With the RP³_β module to hand, we can query the database to get the recommended items and evaluation scores on our test data.

def alpha = 0.8593
def neighborhood_size = 324
def beta = 0.6574
def k = 10
def test_users(u) = test_data(u, _)

def RecSys =RP3beta[train_data, alpha, neighborhood_size, beta]

// For any user u, E[u][i] is the predicted score for item i

def E[u in test_users](i, s) =
   evaluation_data(u, i) and
   RecSys[:P3ui](u, i, s)
def recommendations(u, i) = recommend_top_k[E, k](u, i)

def metrics[:precision](m) =
   mean[u in test_users, p: precision[recommendations[u], test_data[u]](p) ](m)
def metrics[:recall](m) =
   mean[u in test_users, r: recall[recommendations[u], test_data[u]](r) ](m)
def output = metrics

Evaluating the Results

In our experiments, for k=10, P³_α and RP³_β achieve precisions of 32.8% and 34.2%, respectively. Furthermore, the recall scores are 20.9% and 21.7%.

The evaluation scores show that RP³_β slightly outperforms P³_α and neighborhood-based methods (the precision was 33.6% and the recall was 21.2%) on the Movielens100K dataset.

As in our previous work, the entire pipeline is implemented in our Relational Knowledge Graph System (RKGS), eliminating the need to transfer data out of the system at any point.

This approach simplifies the recommendation process, as generating personalized recommendations for a particular user simply involves querying the database to retrieve the top K recommendations.

On top of that, implementing graph-based recommender systems in Rel is highly beneficial as the language can efficiently aggregate over paths on large and sparse graphs, while providing a compact and readable implementation that is easy to maintain.

In our final post in this series, we will describe how to use Rel to provide content-based and hybrid recommender systems that leverage additional information signals, such as the genre of a movie.

Read the entire blog series on Recommender Systems in Rel: