Join us at Snowflake Summit June 26-29 in Las Vegas!

Hossein Keshavarz, Yasmeen Hany, Pigi Kouki

11 April 2023

5 min read

In our previous blog post, we explained how to model traditional neighborhood-based recommender systems in Rel. In what follows, we focus on modeling graph-based recommender systems.

These models expand the notion of direct neighborhoods (used in the neighborhood-based models) to longer paths between users and items. This way, the graph-based models provide a broader interpretation of neighborhood.

We consider two simple, yet effective, algorithms: P^{3}_{α} and RP^{3}_{β}. The recommendation in these methods is based on a three hop random walk (RW) from users to items and vice-versa.

In P^{3}_{α} and RP^{3}_{β}, the similarity between distinct items are computed based on the aggregation of a score function over paths of length two between them. In the majority of real-world use cases, the user-item graph is extremely sparse. For example, there are only around 7M edges between 150K items and 1.99M users in the user-reviews dataset released by Yelp.

A framework for fast path aggregation over sparse graphs is therefore critical for the implementation of graph-based models such as P^{3}_{α} and RP^{3}_{β}. RAI's declarative modeling language Rel is a great environment for implementing graph-based recommender systems, because:

- It efficiently aggregates over paths on large and sparse graphs.
- It provides an easily maintainable model due to a compact and easy-to-read implementation.

As we will show, graph-based methods can achieve higher precision and recall in the Movielens 100K dataset, while at the same time being twice as fast compared to the user- and item-based models.

Note that both P^{3}_{α} and RP^{3}_{β} methods fall into the category of collaborative filtering methods. In other words, these algorithms only rely on connections between users and items (i.e., who rates what) and they completely ignore the content of items or user-level information. Therefore, they can be modeled in Rel by slightly revising the neighborhood-based models.

In fact, apart from the subtle differences in the computation of item-item similarities and scoring user-item pairs, the building blocks of the graph-based recommender systems remain intact. The image below exhibits the architecture of the graph-based recommender system.

The pipeline of the graph-based recommender system.

The behavior of a Random Walk (RW) in a graph is described by the probability of moving between nodes, also known as transition probabilities. In the absence of content or user-level information, recommender systems usually deal with undirected bipartite user-item graphs.

There are two possible moves in these bipartite graphs:

- Traveling from a user node to an item node.
- Traveling from an item node to a user node.

In order to avoid unnecessary complexity, we focus on a simple RW model. In this scenario, the RW treats all the users (and items) the same way. So:

- For a user u who rated d items I
_{u }= {i_{1 }, . . . , i_{d}}, moving from node u to i happens with probability 1/d for any item i∈I_{u}. In other words, P_{ui }= 1/d. - Similarly, if item i has been rated by d users U
_{i }= {u_{1}, . . . , u_{d}}, then the RW moves from i to any u∈U_{i}with probability 1/d. In other words, P_{iu }= 1/d.

Probability of jumps for a simple RW in the user-item graph.

Next, let’s implement RW jump probabilities from a user to an item in Rel (we use the same idea to model the probability of moving from an item to a user).

```
def users(u) = Data(u, _)
def items(i) = Data(_, i)
def n_ratings_per_user(u, d) = count[i: Data(u, i)](d)
// Probability of moving from user u to item i
def Pui(u, i, p) =
Data(u, i) and
(p = 1/d) and n_ratings_per_user(u, d)
from d
```

Although here we only focus on binary interactions between users and items, the transition probabilities can be easily generalized to non-binary ratings (such as 1-5 ratings).

Despite some subtle differences, both P^{3}_{α} and RP^{3}_{β} algorithms take a similar approach to defining the item-item similarities. The core idea is that the similarity between two items is determined by the probability of moving from one item to the other in a path of length two.

For a compact mathematical formulation of item-item similarities, we need to define an important term: U_{ij.}

For items i and j, U_{ij} represents all users who rated both items. As shown in the figure, any user in U_{ij}_{ }is the middle node in a path of length two between i and j.

Visual illustration of a user node in U_{ij}.

P^{3}_{α} computes the similarity between two items i and j using the following formula:

where α is a non-negative hyperparameter, tuned on a validation set.

In fact, S_{ij} is a certain type of aggregation over all paths of length two between i and j. Particularly for α = 1, S_{ij} is the same as the probability of a two-hop jump from i to j.

RP^{3}_{β} adjusts P^{3}_{α} similarities by dividing by the item popularities (denoted by d_{j} ) raised to the power of β. Strictly speaking, RP^{3}_{β} item-item similarities are defined by

The popularity of item j is defined as its degree in the user-item graph. Bringing d_{j} into S_{ij} reduces similarity between popular item j (the large value of d_{j}) and other items, which leads to more diverse recommendations in many benchmark datasets.

We can model the complete process of defining S_{ij} terms in Rel.

```
def n_ratings_per_item(i, d) = count[u: Data(u, i)](d)
// P3alpha item-item similarities (S_ij)
def p3alpha_item_item_sim[i, j](sim) =
Alpha(a) and
items(i) and items(j) and
(i != j) and
(sim = w1*w2) and
power[n_ratings_per_item[i], -a](w1) and
sum[(u, s): Data(u, i) and Data(u, j) and
power[n_ratings_per_user[u], -a](s)](w2)
from w1, w2, a
// computing RP3beta similarities if beta != 0
def item_item_sim[i, j](sim) =
if empty[Beta] then
p3alpha_item_item_sim[i, j](sim)
else
Beta(b) and power[n_ratings_per_item[j], -b](f) and
p3alpha_item_item_sim[i, j](intermediate_sim) and
sim = intermediate_sim*f
from b, f, intermediate_sim
end
```

We don't need to model P^{3}_{α} and RP^{3}_{β} separately in Rel, as P^{3}_{α} is an special case of RP^{3}_{β} (for_{ }β ≠ 0). All we need is a conditional statement to check for 0. As β is a relation in Rel, 0 means that β isn’t an empty set. So using `empty[Beta]`

does the trick.

In many practical applications, the user-item graph is sparse. However, the sparsity of the graph doesn’t necessarily translate to a sparse similarity matrix. In fact, it’s possible to have at least one path of length two between the majority of distinct items i.e., S_{ij} ≠ 0 for many (i,j) pairs.

Working with a dense similarity matrix has two possible drawbacks:

- Heavy computation for recommending relevant items.
- Less accurate recommendations (lower scores on the test data).

These possible issues can be addressed by post-processing the item-item similarities. Sparsification and row normalization are two common post-processing steps to mitigate the above challenges in graph-based algorithms (e.g., Dacrema et al., 2021).

- Sparsification: Given a hyperparameter n, known as neighborhood size, we only keep n largest elements per row and column of the similarity matrix. Simply put, zero-out S
_{ij}if it’s too small. - Normalization: We normalize each row of the similarity matrix with its sum.

```
//STEP 1. Sparsifying the rows of S_ij
def item_item_sim_row_sp(i, j, sc) =
Neighborhood_size(n) and
topk[{(l, s): item_item_sim(i, l, s)}, n](_, j, sc)
from n
//STEP 2. Normalizing the rows of S_ij
def row_sum(i, ts) = sum[l,v:item_item_sim_row_sp(i, l, v)](ts)
def item_item_sim_nrm(i, j, ns) =
(ns = s/ts) and
item_item_sim_row_sp(i, j, s) and row_sum(i, ts)
from ts, s
//STEP 3. Sparsifying the columns of S_ij
def Sij(i, j, sc) =
Neighborhood_size(n) and
items(j) and
topk[{(l, s): item_item_sim_nrm(l, j, s)}, n](_, i, sc)
from n
```

Similar to the neighborhood-based methods, the next step after computing item-item similarities is to score all (u, i) pairs using the computed similarity matrix.

The scoring is based on summing the score of all paths of length three between u and i. Given the user-item scores, we follow the same strategy as in the neighborhood-based models for ranking and recommendation steps.

Any path of length three between u and i is comprised of two components:

- An edge between u and an item j (which has been rated by user u).
- A path of length two between j and i.

So, aggregation over paths of length three can be calculated as follows:

Here is our implementation of the scoring step in Rel.

```
def P3ui[u, i](v) =
Alpha(a) and
users(u) and not Data(u, i) and
sum[(j, s): (s = s_uj*s_ji) and
Pui(u, j, p_uj) and power[p_uj, a](s_uj) and
Sij(j, i, s_ji)
from p_uj, s_uj, s_ji](v)
from a
```

With the RP^{3}_{β} module to hand, we can query the database to get the recommended items and evaluation scores on our test data.

```
def alpha = 0.8593
def neighborhood_size = 324
def beta = 0.6574
def k = 10
def test_users(u) = test_data(u, _)
def RecSys =RP3beta[train_data, alpha, neighborhood_size, beta]
// For any user u, E[u][i] is the predicted score for item i
def E[u in test_users](i, s) =
evaluation_data(u, i) and
RecSys[:P3ui](u, i, s)
def recommendations(u, i) = recommend_top_k[E, k](u, i)
def metrics[:precision](m) =
mean[u in test_users, p: precision[recommendations[u], test_data[u]](p) ](m)
def metrics[:recall](m) =
mean[u in test_users, r: recall[recommendations[u], test_data[u]](r) ](m)
def output = metrics
```

In our experiments, for k=10, P^{3}_{α} and RP^{3}_{β} achieve precisions of 32.8% and 34.2%, respectively. Furthermore, the recall scores are 20.9% and 21.7%.

The evaluation scores show that RP^{3}_{β }slightly outperforms P^{3}_{α} and neighborhood-based methods (the precision was 33.6% and the recall was 21.2%) on the Movielens100K dataset.

As in our previous work, the entire pipeline is implemented in our Relational Knowledge Graph System (RKGS), eliminating the need to transfer data out of the system at any point.

This approach simplifies the recommendation process, as generating personalized recommendations for a particular user simply involves querying the database to retrieve the top K recommendations.

On top of that, implementing graph-based recommender systems in Rel is highly beneficial as the language can efficiently aggregate over paths on large and sparse graphs, while providing a compact and readable implementation that is easy to maintain.

In our final post in this series, we will describe how to use Rel to provide content-based and hybrid recommender systems that leverage additional information signals, such as the genre of a movie.

RelationalAI's declarative modeling language Rel can be a powerful tool for machine learning data preprocessing. It is concise, readable, and facilitates testing and debugging in development. Rel can significantly simplify your machine learning data pipeline.

Read MoreIn this series of blog posts, we show how to implement neighborhood-based, graph-based, content-based, and hybrid recommender systems using RelationalAI’s declarative modeling language, Rel. The implementations of these algorithms demonstrate the efficiency of our Relational Knowledge Graph System (RKGS) for aggregating over paths on large and sparse graphs, computing similarities using our graph analytics library, and building compact, easy to read models, without needing to transfer data outside of our system.

Read MoreRecommender systems are one of the most successful and widely used applications of machine learning. Their use cases span a range of industry sectors such as e-commerce, entertainment, and social media. In this post, we focus on a fundamental and effective classical approach to recommender systems, which is neighborhood-based.

Read More