Consider a case where you have created features, you know about the importance of features and you are supposed to make a classification model that is to be presented in a very short period of time?
What will you do? You have a very large volume of data points and very less few features in your data set. In that situation if I had to make such a model I would have used ‘Naive Bayes’, that is considered to be a really fast algorithm when it comes for classification tasks.
In this blog, I am trying to explain how the algorithm works that can be used in these kinds of scenarios. If you want to know what is classification and other such algorithms you can refer here.
Naive Bayes is a machine learning model that is used for large volumes of data, even if you are working with data that has millions of data records the recommended approach is Naive Bayes. It gives very good results when it comes to NLP tasks such as sentimental analysis. It is a fast and uncomplicated classification algorithm.
To understand the naive Bayes classifier we need to understand the Bayes theorem. So let’s first discuss the Bayes Theorem.
It is a theorem that works on conditional probability. Conditional probability is the probability that something will happen, given that something else has already occurred. The conditional probability can give us the probability of an event using its prior knowledge.
P(H): The probability of hypothesis H being true. This is known as prior probability.
P(E): The probability of the evidence.
P(E|H): The probability of the evidence given that hypothesis is true.
P(H|E): The probability of the hypothesis given that the evidence is true.
It is a kind of classifier that works on Bayes theorem.
Prediction of membership probabilities is made for every class such as the probability of data points associated to a particular class.
The class having maximum probability is appraised as the most suitable class.
The MAP for a hypothesis is:
𝑀𝐴𝑃 (𝐻) = max 𝑃((𝐻|𝐸))
𝑀𝐴𝑃 (𝐻) = max 𝑃((𝐻|𝐸) ∗ (𝑃(𝐻)) /𝑃(𝐸))
𝑀𝐴𝑃 (𝐻) = max(𝑃(𝐸|𝐻) ∗ 𝑃(𝐻))
𝑃 (𝐸) is evidence probability, and it is used to normalize the result. Result will not be affected by removing 𝑃(𝐸).
NB classifiers conclude that all the variables or features are not related to each other.
Existence or absence of a variable does not impact the existence or absence of any other variable.
A fruit may be observed to be an apple if it is red, round, and about 4″ in diameter.
In this case also even if all the features are interrelated to each other, a NB classifier will observe all of these independently contributing to the probability that the fruit is apple.
We experiment the hypothesis in real datasets, given multiple features.
So, computation becomes complex.
1. Gaussian Naïve Bayes: When characteristic values are continuous in nature then an assumption is made that the values linked with each class are dispersed according to Gaussian that is Normal Distribution.
2. Multinomial Naïve Bayes: Multinomial Naive Bayes is favoured to use on data that is multinomial distributed. It is widely used in text classification in NLP. Each event in text classification constitutes the presence of a word in a document.
3. Bernoulli Naïve Bayes: When data is dispensed according to the multivariate Bernoulli distributions then Bernoulli Naive Bayes is used. That means there exist multiple features but each one is assumed to contain binary value. So, it requires features to be binary valued.
It is a highly extensible algorithm which is very fast.
It can be used for both binary as well as multiclass classification.
It has mainly three different types of algorithms that are GaussianNB, MultinomialNB, BernoulliNB.
It is a famous algorithm for spam email classification.
It can be easily trained on small datasets and can be used for large volumes of data as well.
The main disadvantage of the NB is considering all the variables independent that contributes to the probability.
Real time Prediction: Being a fast learning algorithm it can be used to make predictions in real time as well.
MultiClass Classification: It can be used for multi class classification problems also.
The problem statement is to classify patients as diabetic or non diabetic. The dataset can be downloaded from Kaggle website that is ‘PIMA INDIAN DIABETES DATABASE’. The datasets had several different medical predictor features and a target that is ‘Outcome’. Predictor variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Code implementation of importing and splitting the data
For doing the exploratory data analysis of the dataset you can look for the techniques here.
Confusion-matrix & model score test data
Evaluation of the model
Receiver operating characteristic curve also known as roc_curve is a plot that tells about the interpretation potential of a binary classifier system. It is plotted between the true positive rate and the false positive rate at different thresholds. ROC curve area was found to be 0.80.
For the python file and also the used dataset in the above problem you can refer to the Github link here that contains both.
In this blog, I have discussed Naive Bayes algorithms used for classification tasks in different contexts. I have discussed what is the role of bayes theorem in NB Classifier, different characteristics of NB, advantages and disadvantages of NB, application of NB and in the last I have taken a problem statement from kaggle that is about classifying patients as diabetic or not.
Reliance Jio and JioMart: Marketing Strategy, SWOT Analysis, and Working EcosystemREAD MORE
What is the OpenAI GPT-3?READ MORE
Introduction to Time Series Analysis: Time-Series Forecasting Machine learning Methods & ModelsREAD MORE
6 Major Branches of Artificial Intelligence (AI)READ MORE
Top 10 Big Data Technologies in 2020READ MORE
7 types of regression techniques you should know in Machine LearningREAD MORE
How is Artificial Intelligence (AI) Making TikTok Tick?READ MORE
7 Types of Activation Functions in Neural NetworkREAD MORE
8 Most Popular Business Analysis Techniques used by Business AnalystREAD MORE
Introduction to Logistic Regression - Sigmoid Function, Code ExplanationREAD MORE