Recommender Engine
What it is and how to go about it.
Overview
A recommender system is a subclass of information filtering system that seeks to predict the “rating” or “preference” that a user would give to an item.
The recommendation system here is a user-user collaborative filtering system. What this means is that, the system finds a group of similar users based on ratings of similar movies and recommending movies that they liked.
Dataset
The dataset used here is the MovieLens dataset by GroupLens.
The dataset consists of 100,000 ratings on 9,000 movies by almost 700 users.
Each movie has a MovieID and has a list of genres it falls under (Comedy, Adventure, etc.) Each user has UserID and has rated at least one movie. The ratings are whole numbers in the range of 0 to 5.
Pre-run Calculations
1. Ratings Normalization
Each rating of every user is normalized such that the ratings of tough raters and lenient raters are considered equal. The sum total of each user’s ratings is calculated. Then, each rating is subtracted from the user’s average rating.
Such that, if a user rated a movie higher than his average rating, it has a positive value. Otherwise negative.
This is all stored in the database so that it can be accessed during runtime.
2. Calculating Genre Count
The genre count of every user is calculated and stored in the database.
We take a look at every movie rated by a user. For each genre that the movie belongs to, we add the normalized rating to get an overview of which genre the user generally prefers.
If a user prefers action movies, he will generally rate action movies more than movies of other genre. So the value of ‘Action’ in the genre count of that user will be high.
Finding Similar Users
After the new user rates all the movies the user has watched, the procedure of normalization of the user’s ratings and calculation of the genre count of the user is the same as before.
A group of similar users as the new user can be found using the genre count parameter of the new user and all the previous users.
The angle is calculated between the new user’s genre count vector and the genre count vector of all the previous users. The smaller the angle between the two vectors, the more similar the two users are.
Cosine of the angle means a very similar user will have a value close to 1 and a user who’s tastes differ a lot will have a value close to -1.
We go through the genre count of every user and calculate the cosine similarity. We find the top 5 most similar users. We then take their top rated movies and check if it has not already been watched by the new user.
If it hasn’t been watched, the recommender system recommends the movie.
Working
This is an API which accepts a list of the user’s watched movies with ratings in a JSON format and returns a list of recommender movies and their IMDb URLs in a JSON format which can then be parsed at the receiver end.
A POST request is made to pass the rated movies JSON to the API. The JSON is parsed and all individual movies and their ratings are then processed.
The ratings are normalized and genre count is updated.
Each user is compared to the new user and the similarity is stored in a python dictionary.
The top rated movies of the top 5 most similar users are then added to a dictionary.
This is returned back to the caller in JSON format.