Review-based Recommendations System

  • Tanesh Balodi
  • Dec 15, 2020
  • Deep Learning
  • Machine Learning
Review-based Recommendations System title banner

In the blog, we will discuss the concept of recommendation system, working approach of review based recommendation system, and python code for implementing it.


What is a Recommendation System?


A recommendation system is a machine learning model that recommends online movies, clothes, blogs, and more to ease your selection process in a way that recommended products are based on your previous history of selection. For example, you’d see “top picks for you” on Netflix after watching a few movies or series on this content platform or you while searching for a few products or clothes on an online shopping platform like Amazon.


You would have gone through the recommendations it is offering to the users, you must’ve also seen the automated playlist created by an audio streaming platform like Spotify for you, all of this is a result of the Recommendation System. (Must read: How Spotify uses Machine Learning models?)


According to the report of Mckinsey, 75% of Netflix views are boosted with the help of recommendations systems whereas around 35% of Amazon purchases are boosted with the help of this machine learning algorithm.

Displaying the stats showing a growth of Netflix and Amazon with the help of Recommendation System

Stats Showing Growth of Netflix and Amazon with the help of Recommendation System

Working Approach of Review-Based Recommendation System


While there are many types of recommendation systems such as Popularity based recommendation system, classification model, content-based recommendation system, and more, what we will be discussing is a review-based recommendation system in machine learning and how to implement it using python code. 


Earlier, the recommendations were based on the product trends which means the product that is being used more was recommended almost to everyone, some other approaches used rating histories in order to provide recommendations. Later on, researchers dwelled a little and found that the user’s textual reviews could act as an important data source as input to the recommendation system. So in the review-based recommendation system, both textual reviews, as well as ratings or trends, can be used as input.


The main purpose, for which the review based recommendation system was developed, was to extract relevant information from the user’s textual review of a product, movie, or song. This is how we can amalgamate machine learning with natural language processing. (Related blog: Top 10 Natural Processing Languages (NLP) Libraries with Python)


The Reviews are taken as a dataset and various analysis methods such as text analysis and opinion mining are performed as the first step, later on, a user profile is created on the basis of the result we got through text analysis and opinion mining. The obtained result is engaged with the recommender approaches to achieve precise recommendations for the individual user. (Read also: 6 Dynamic Challenges in Recommendation System).

Reflecting the working approach of the Review-Based Recommendation System

Working of Review-Based Recommendation System, Source: ResearchGate

The textual reviews can be taken as the input with the help of word embeddings instead of TF-IDF approach. We shall see how this model performs on the real-world dataset, our implementation will be based on the customer reviews of Amazon products. 


Python Implementation of Review-based Recommendation system


We are using consumer reviews of amazon products as a dataset, You can download it from Kaggle


Step 1: Importing Libraries and reading dataset with the help of pandas Libraries

%matplotlib inline

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.neighbors import NearestNeighbors

from scipy.spatial.distance import cosine

from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

import re

import string

from wordcloud import WordCloud, STOPWORDS

from sklearn.metrics import mean_squared_error

import csv

df = pd.read_csv(r'Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products.csv')


Step 2:  Viewing Index



Step 3:  Viewing dataset


Now count ‘asins’ ( all non-null values) and also grouping the mean of ‘asins’

count = df.groupby("asins", as_index=False).count()

mean = df.groupby("asins", as_index=False).mean()

dfMerged = pd.merge(df, count, how='right', on=['asins'])


Step 3:  Taking non-null values of reviews.text, reviews. ratings, and asins.

df1 = df[['reviews.text','reviews.rating','asins']]

df1 = df1.dropna()


dfProductReview = df.groupby("asins", as_index=False).mean()


Step 4:  Grouping Reviews of Individual Products

ProductReviewSummary = df1.groupby("asins")["reviews.text"].apply(str)

p = ProductReviewSummary.to_frame()

p['reviews.text'] = p['reviews.text'].str.replace('\d+'," ")

p['reviews.text'] = p['reviews.text'].str.replace('\n'," ")

p['reviews.text'] = p['reviews.text'].str.strip(" ")


-> 24


Step 5:  Tfidf Matrix and Cosine Similarity

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import cosine_similarity

tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')

tfidf_matrix = tf.fit_transform(p['reviews.text'])


-> (24, 9968)

cosine_similarities = cosine_similarity(tfidf_matrix,Y=None,dense_output=False)

cnum = (cosine_similarities.toarray())




Step 6:  Recommendations

def get_recommendations(id):

    print("the product selected is {}".format(p.index[id]))

    a = cosine_similarities.getcol(id)

    val = list(enumerate(


    b= dict(val)


    c = sorted(b.items(),key=lambda x:x[1],reverse=True)[1:4]

    k = 1

    for idx in c:

        print("The {} Recommendation is {}".format(k,p.index[idx[0]]))

        k += 1



We could have also used the Count vector with KNN machine learning algorithm to get recommendations.




The recommendation system is another wonder of machine learning to ease the selection process, with time, we are seeing methods of implementations are changing, or sometimes new data is becoming a key resource for the machine learning or natural language processing models. (Check also: Machine Learning vs Deep Learning). While everything in the computer science field is moving so fast, we are committed to providing information services as fast as we can at analytics steps.