• Category
  • >Machine Learning

What is a Content-based Recommendation System in Machine Learning?

  • Utsav Mishra
  • May 29, 2021
What is a Content-based Recommendation System in Machine Learning? title banner

Ever wondered how a simple-looking computer or laptop is able to do all complex things? Ever wondered how they are trained to be the way they are? 


These machines with every passing day are becoming smarter and smarter, sometimes smarter than the human mind, and how is this becoming possible? This is due to the technique “Machine Learning”



Machine Learning


Machine learning is a part of artificial intelligence (AI) that gains experience from data and improves its performance and accuracy by the time without being explicitly programmed.  


An algorithm is a set of statistical processing procedures used in data science. Algorithms are 'trained' in machine learning to detect patterns and features in huge volumes of data so that they can make judgments and predictions based on new data. As the more data is processed, the smarter the algorithm becomes, the more accurate the decisions and forecasts become.


In simpler words, Machine learning is a science of making computers behave like humans and making them act like the human mind.


Some common applications of machine learning are image recognition software, speech recognition, medical diagnosis, and many more.


Let us move a bit further and throw some light on one important part of machine learning that is the Recommender System,



What is a Recommender System?


Recommender systems are a type of machine learning algorithm that provides consumers with "relevant" recommendations. When we search for something anywhere, be it in an app or in our search engine, this recommender system is used to provide us with relevant results. They use a class of algorithms to find out the relevant recommendation for the user.


For example, if a user listens to rock music every day, his youtube recommendation feed will get full of rock music and music of related genres.


In this, items are ranked according to their relevancy and the most relevant ones are recommended to the user. The recommendation system must assess the relevance, which is primarily based on past data. Just like the rock music thing we just saw.


The recommender system is divided into mainly two categories: Collaborative filtering and content based filtering.


Collaborative filtering


Methods for recommender systems that are primarily based on previous interactions between users and the target items are known as collaborative filtering methods. 


As a result, all past data about user interactions with target objects will be fed into a collaborative filtering system. This information is usually recorded as a matrix, with the rows representing users and the columns representing items.


The basic premise of such systems is that the users' previous data should be sufficient to generate a prediction. That is, we don't require anything other than historical data, no more user input, no current trending data, and so on.


Furthermore, collaborative filtering methods are divided into two sub-groups: memory-based methods and model-based methods.


(Similar blog: Review-based Recommendation System)


  • Memory Based


Memory-based methods are the most basic because they use no model at all. They assume that predictions can be made based solely on "memory" of past data and typically use a simple distance-measurement approach, such as the nearest neighbor



  • Model Based


Model-based approaches, on the other hand, usually presuppose some form of the underlying model and attempt to ensure that any predictions made fit the model properly.


Now let us jump to the main course of our discussion,  which is a second category of recommender system, i.e., content-based recommendation system. Before that understand the challenges of the recommendation system


Content-based Recommender System


Content-based filtering is one popular technique of recommendation or recommender systems. The content or attributes of the things you like are referred to as "content." 


Here, the system uses your features and likes in order to recommend you with things that you might like. It uses the information provided by you over the internet and the ones they are able to gather and then they curate recommendations according to that.  


The goal behind content-based filtering is to classify products with specific keywords, learn what the customer likes, look up those terms in the database, and then recommend similar things.


This type of recommender system is hugely dependent on the inputs provided by users, some common examples included Google, Wikipedia, etc. For example, when a user searches for a group of keywords, then Google displays all the items consisting of those keywords. The below video explains how a content-based recommender works.



Suppose I am a fan of the Harry Potter series and watch only such kinds of movies on the internet. When my data will be gathered from Google or Wikipedia, it will be found out that I am a fan of fantasy movies. Therefore, my recommendation will be filled with fantasy movies. Among all the movies, the ones best for me will be curated and then recommended to me.


Suppose there are two movies, one is Fantastic Beasts and the other is Shawshank Redemption, then according to my preference of fantasy movies, the Fantastic Beasts will recommend to me.


How does it work?


The content-based recommendation system works on two methods, both of them using different models and algorithms. One uses the vector spacing method and is called method 1, while the other uses a classification model and is called method 2.


  • Method 1: The vector space method


Let us suppose you read a crime thriller book by Agatha Christie, you review it on the internet. Also, you review one more fictional book of the comedy genre with it and review the crime thriller books as good and the comedy one as bad. 


Now, a rating system is made according to the information provided by you. In the rating system from 0 to 9, crime thriller and detective genres are ranked as 9, and other serious books lie from 9 to 0 and the comedy ones lie at the lowest, maybe in minus.


With this information, the next book recommendation you will get will be of crime thriller genres most probably as they are the highest rated genres for you.


For this ranking system, a user vector is created which ranks the information provided by you. After this, an item vector is created where books are ranked according to their genres on it.


With the vector, every book name is assigned a certain value by multiplying and getting the dot product of the user and item vector, and the value is then used for recommendation.


Like this, the dot products of all the available books searched by you are ranked and according to it the top 5 or top 10 books are assigned.


This method was the first method used by a content-based recommendation system to recommend items to the user.


  • Method 2: Classification method


The second method is the classification method. In it, we can create a decision tree and find out if the user wants to read a book or not.


For example, a book is considered, let it be The Alchemist.


Based on the user data, we first look at the author name and it is not Agatha Christie. Then, the genre is not a crime thriller, nor is it the type of book you ever reviewed. With these classifications, we conclude that this book shouldn’t be recommended to you.



Advantages and Disadvantages of content-based recommendation system


Advantages of content-based recommender system are following;


  • Because the recommendations are tailored to a person, the model does not require any information about other users. This makes scaling of a big number of people more simple.

  • The model can recognize a user's individual preferences and make recommendations for niche things that only a few other users are interested in.

  • New items may be suggested before being rated by a large number of users, as opposed to collective filtering.


Listing below the disadvantages of it;


  • This methodology necessitates a great deal of domain knowledge because the feature representation of the items is hand-engineered to some extent. As a result, the model can only be as good as the characteristics that were hand-engineered.


  • The model can only give suggestions based on the user's current interests. To put it another way, the model's potential to build on the users' existing interests is limited.


  • Since it must align the features of a user's profile with available products, content-based filtering offers only a small amount of novelty. 


Only item profiles are generated in the case of item-based filtering, and users are recommended items that are close to what they rate or search for, rather than their previous background. A perfect content-based filtering system can reveal nothing surprising or unexpected.





Here we have seen how machine learning helps in recommending items to a user. As we came to know about the two types of filtering and especially about content-based filtering and the methods of it, now we know how recommendations are sent to us.


Looking at the methods of content-based recommendation we understood that a computer uses many processes to make our lives easier, one of them is the recommendation process.


(Recommended blog: What is Knowledge graph?)


From now on, whenever we will open a website and the software will recommend something to us, we can easily tell how this happened.

Latest Comments