• Category
  • >Machine Learning

An Introduction to K-nearest Neighbor (KNN) Algorithm

  • Utsav Mishra
  • Jan 03, 2022
An Introduction to K-nearest Neighbor (KNN) Algorithm title banner



Machine learning has evolved a lot in the past few years. The machine learning industry has seen all kinds of development now. From being used as an experimental thing, to becoming an industry of its own. The reason for the development of machine learning can be seen clearly now.


How many of us would have imagined that someone can train a machine and make them do stuff according to them? Very few. But see what happened. A technique came and took the world by storm. 


When Machine Learning was introduced, there were just a few ways to make the machine learn what we wanted them to learn. But as we started to get an idea about what a machine can do for us, we started off with new algorithms of machine learning.


In this process, we got one more ML algorithm, that was designed just for the sake of its application, i.e, the “K-Nearest Neighbour” algorithm. In this blog, we will learn about what KNN is and why it is used. As we dive in, let us try to know what KNN is.


What is KNN?


Suppose, there are two forests on two sides of a road. One day a wolf forgets his way back home and keeps sitting on the road. Seeing the wolf on the road people come and try to find which forest he belongs to. There someone thinks of an idea and starts to calculate the distance of the wolf from both sides of the forest. After calculating the same, he sends the wolf to the side of the forest which he finds closest from the wolf.


Now let us think of it from a different perspective. Imagine that the man who used the method is a computer that is trained to classify objects in the same way. 


In simple words, the supervised learning technique, K-nearest neighbors (KNN) is used for both regression and classification. By computing the distance between the test data and all of the training points, KNN tries to predict the proper class for the test data. Then choose the K number of points that are the most similar to the test data. 


The KNN algorithm analyses the likelihood of test data belonging to each of the 'K' training data classes, and the class with the highest probability is chosen. In regression, the value is the average of the 'K' training points chosen.


Because it does not create a model of the data set beforehand, the k-nearest-neighbor technique is an example of a "lazy learner." It only performs calculations when prompted to poll the data point's neighbors. This makes KNN a breeze to use in data mining.


To know more about the KNN and its working, watch this:


Why do we need KNN?


Assume there are two categories, Category A and Category B, and we receive a new data point x1. Which of these categories will this data point fall into? A K-NN algorithm is required to address this type of problem. We can simply determine the category or class of a dataset with the help of KNN.


Because it delivers highly precise predictions, the KNN algorithm can compete with the most accurate models. As a result, the KNN algorithm can be used in applications that require high accuracy but don't require a human-readable model.


The distance measure affects the accuracy of the predictions. As a result, the KNN method is appropriate for applications with significant domain knowledge. This understanding aids in the selection of an acceptable metric.



How does KNN work?


The KNN method is mostly employed as a classifier, as previously stated. Let's have a look at how KNN classifies data points that aren't visible.


Unlike artificial neural network classification, k-nearest neighbors classification is straightforward to understand and implement. It's suitable for scenarios with well-defined or non-linear data points.


To put it another way, KNN uses a voting method to decide the class of an unobserved observation. This signifies that the data point's class will be determined by the class with the most votes.


If the value of K is one, we'll only use the nearest neighbor to identify the data point's class. If K equals ten, we'll use the ten closest neighbors, and so on.


Consider the following example: X is an unclassified data point. In a scatter plot, there are multiple data points with known categories, A and B. Now, assume that data point X is close to group A.


We categorize a data point by looking at the nearest annotated points, as you may know. If K is equal to one, we'll just utilize one nearest neighbor to figure out which group the data point belongs to.


Because its nearest neighbor is in the same group, the data point X belongs to group A in this situation. If group A has more than ten data points and K equals 10, data point X will still belong to group A because all of its closest neighbors are in the same group.

Before and After K-NN

Before and After K-NN, Source

Assume a third unclassified data point, Y, is sandwiched between groups A and B. If K is equal to 10, we choose the group with the most votes, which means that we assign Y to the group with the most neighbors. For example, Y belongs to group B if it has seven neighbors in group B and three neighbors in group A.


Regardless of the number of categories present, the classifier assigns the largest number of votes to the category with the most votes.


(Similar reading: K-means Clustering in Machine Learning)



Advantages of KNN


The advantages of KNN are:


  • KNN is known as the “Lazy Learner” since there is no training period (Instance-based learning). During the training phase, it does not learn anything. The training data isn't used to derive any discriminative functions.


In other words, it does not require any training. It saves the training dataset and uses it only when making real-time predictions to learn from it. This makes the KNN method much faster than other training-based algorithms like SVM and Linear Regression.


  • Because the KNN algorithm does not require any training before making predictions, new data can be supplied without affecting the system's accuracy.


  • KNN is a simple algorithm to use. KNN can be implemented with only two parameters: the value of K and the distance function.


On an Endnote, let us have a look at some of the real-world applications of KNN.


7 Real-world applications of KNN


The k-nearest neighbor algorithm can be applied in the following areas:


  1. Credit score


The KNN algorithm compares an individual's credit rating to others with comparable characteristics to help calculate their credit rating.


  1. Approval of the loan


The k-nearest neighbor technique, similar to credit scoring, is useful in detecting people who are more likely to default on loans by comparing their attributes to those of similar people.


  1. Preprocessing of data


Many missing values can be found in datasets. Missing data imputation is a procedure that uses the KNN algorithm to estimate missing values.


  1. Recognizing patterns


The KNN algorithm's capacity to recognize patterns opens up a vast range of possibilities. It can assist detect and spotting suspicious patterns in credit card usage, for example. Pattern detection can also be used to spot patterns in client purchasing habits.


  1. Prediction of stock prices


The KNN algorithm is useful in estimating the future value of stocks based on previous data since it has a knack for anticipating the prices of unknown entities.


  1. Recommendation systems


Recommendation systems are a type of system that is used to make recommendations


KNN can be used in recommendation systems since it can help locate people with comparable traits. It can be used in an online video streaming platform, for example, to propose content that a user is more likely to view based on what other users watch.


  1. Computer Vision


For picture classification, the KNN algorithm is used. It's important in a variety of computer vision applications since it can group comparable data points together, such as cats and dogs in separate classes.

Latest Comments