Yasmeen Hany, Hossein Keshavarz, Pigi Kouki
19 April 2023
3 min read
In our two previous posts, we looked at graph-based and neighborhood-based recommender systems. We focused on generating recommendations using only user-item interactions, an approach known as collaborative filtering.
However, more information is usually available, which can drive a content-based approach. For example, a movie recommendation is usually based on information about the movie itself, such as the genre and actors.
The content-based approach can be combined with the collaborative filtering approach to create what is known as a hybrid recommender system. The most popular example of a hybrid recommender system is Netflix.
To recommend movies to users, Netflix uses both the watching habits of similar users (collaborative filtering), as well as movie characteristics, such as genres the user previously showed interest in (content-based filtering) to make recommendations.
Example of the hybrid approach.
Previously, we modeled the user-item interactions as a bipartite graph consisting of two node types:
Movie. In this example, we are using the content-based approach, where the graph is augmented by a new node type:
In other words, we now deal with a tripartite graph, shown in the figure below. Note that the tripartite graph model can easily be extended into richer graphs by adding more node types, for example, actor, director, and so on.
The tripartite graph representation of the MovieLens dataset.
This additional node type is modeled in Rel as a value type as follows:
entity type Movie = Int entity type User = Int value type Id = Int value type Name = String // add genre value type definition (a genre is identified by its id) value type Genre = Int
After defining our additional node type, we need to modify the
Movie entity by adding the edge
has_genre that connects a
Movie to its
has_genre relation assumes there is another relation
movie_info(movie, movie_name, genre_id) that contains movie names and genres (a movie can be assigned multiple genres). The data in the
movie_info relation is provided by MovieLens.
// update module movie_info // entity node: Movie def Movie = ^Movie[m : watched_train(_, m)] // edge: has_id def has_id(movie, id) = movie = ^Movie[m] and id = ^Id[m] and watched_train(_, m) from m // edge: has_name def has_name(movie, name) = movie = ^Movie[m] and name = ^Name[n] and watched_train(_, m) and movie_info(m, n) from m,n // edge: has_genre def has_genre(movie, genre) = movie = ^Movie[movie_id] and genre = ^Genre[genre_id] and movie_info[movie_id, _, genre_id] from movie_id, genre_id end // store the data in the `MovieGraph` base relation def insert:MovieGraph = movie_info
We can now use the relation
has_genre to compute similarities using similarity metrics such as cosine, jaccard, and dice.
There is one core difference between the item- and user-based approaches and the content-based approach. In the content-based approach, we calculate similarities using the relation
has_genre which contains (movie, genre) pairs, instead of the relation
rated_t which contains (movie, user) interaction pairs.
For the rest of this implementation, we follow the same pipeline as in our neighborhood-based approach. Below is a sample run that generates recommendations and evaluates them using the content-based method:
with MovieGraph use has_genre, rated_t, Movie with MovieGraph_test use evaluation_sample, User def k_neighbors = 20 def rating_type = :binary def k_recommendations = 10 // (movie, user) def ratings = rated_t // (movie, genre) def content_information = has_genre] def approach = :item_based // define the (movie, genre) undirected graph def G = undirected_graph[content_information] with rel:graphlib[G] use jaccard_similarity def sim[m1, score, m2] = score = -jaccard_similarity[m1, m2] and Movie[m1] and Movie[m2] and m1 != m2 and abs[score] != 0 def get_item_scores = pred[approach, rating_type, k_neighbors, rated_t, sim, top_k] def top_k_recommendations(user, rank, movie) = top[k_recommendations, get_item_scores[user]](rank, score, movie) from score,pos def predicted_ranking = top_k_recommendations def target_ranking = MovieGraph_test:rated def output = average[precision_at_k[k_recommendations, predicted_ranking[u], target_ranking[u]] <++ 0.0 for u where User[u]]
The Netflix model uses a hybrid approach, which computes the similarity between distinct items by combining user- and item-based approaches and content-based filtering.
This idea is materialized in Rel by defining a module named
hybrid_feature. This module takes the following parameters as input:
@outline module hybrid_feature[M, C, W] def rated_value_binary(movie, user, 1.0)= M(movie, user) def weighted_content_value(movie, genre, content_value) = C(movie, genre) and content_value = W // movie, user, 1 OR // movie, genre, content_weight def hybrid_feature_value = rated_value_binary ; weighted_content_value def hybrid_feature_value_t(feature, movie, v) = hybrid_feature_value(movie, feature, v) end
This hybrid feature is then used to calculate item-item similarities. Below is a sample run that generates recommendations and evaluates them using the hybrid method.
with MovieGraph use has_genre, rated_t, Movie with MovieGraph_test use evaluation_sample, User def shrink = 0 def k_neighbors = 20 def rating_type = :binary def k_recommendations = 10 def ratings = rated_t def approach = :item_based def cw = 0.2 with hybrid_feature(rated_t, has_genre, cw) use hybrid_feature_value def hybrid_information(movie, feature) = hybrid_feature_value(movie, feature,_) // define the (movie, feature) undirected graph where the feature could be a user or a genre def G = undirected_graph[hybrid_information] with rel:graphlib[G] use jaccard_similarity def sim[m1, score, m2] = score = -jaccard_similarity[m1, m2] and Movie[m1] and Movie[m2] and m1 != m2 and abs[score] != 0 def get_item_scores = pred[approach, rating_type, k_neighbors, ratings, sim, top_k] def test_items_scores[user, score, movie] = get_item_scores[user, score, movie] and evaluation_sample(user, movie) def top_k_recommendations[user, rank, movie] = top[k_recommendations, test_items_scores[user]](rank, score, movie) from score def predicted_ranking = top_k_recommendations def target_ranking = MovieGraph_test:rated def output = average[precision_at_k[k_recommendations, predicted_ranking[u], target_ranking[u]] <++ 0.0 for u where User[u]]
In our experiments on the MovieLens100K dataset, the content-based and hybrid approaches achieve precision of 7.4% and 33.2% for k = 10 respectively. The results are in accordance with the literature for the content-based and hybrid classes of algorithms.
The evaluation scores show that the collaborative filtering approach (which has a maximum precision of 33.6% for k = 10) outperforms both the content-based and hybrid approaches. This can be attributed to the high density of the user-movie interactions, as the MovieLens100K dataset provides a minimum of 20 ratings for each user. Real-world datasets are significantly sparser, and thus are expected to significantly benefit from the hybrid approach.
We focused on a baseline content-based approach using only the genre information provided by the MovieLens dataset. However, the problem scope can easily be extended in Rel by providing additional content information (director, actor, and so on) from existing knowledge bases such as IMDB and DBPedia.
In this series of blog posts, we have shown how to implement neighborhood-based, graph-based, content-based, and hybrid recommender systems using RelationalAI’s declarative modeling language, Rel.
The implementations of these algorithms demonstrate the efficiency of our Relational Knowledge Graph System (RKGS) for aggregating over paths on large and sparse graphs, computing similarities using our graph analytics library, and building compact, easy to read models, without needing to transfer data outside of our system.
RelationalAI's declarative modeling language Rel can be a powerful tool for machine learning data preprocessing. It is concise, readable, and facilitates testing and debugging in development. Rel can significantly simplify your machine learning data pipeline.Read More
In our previous blog post, we explained how to model traditional neighborhood-based recommender systems in Rel. In what follows, we focus on modeling graph-based recommender systems.Read More
Recommender systems are one of the most successful and widely used applications of machine learning. Their use cases span a range of industry sectors such as e-commerce, entertainment, and social media. In this post, we focus on a fundamental and effective classical approach to recommender systems, which is neighborhood-based.Read More