How Does Support Vector Machine Algorithm Works In Machine Learning?

  • Rohit Dwivedi
  • May 04, 2020
  • Machine Learning
How Does Support Vector Machine Algorithm Works In Machine Learning? title banner

I assume that by now would have been familiar with linear regression and logistic regression algorithms. If you have not followed the same algorithm I would recommend you to go through them first before moving to support vector machines. Support vector machines also known as SVM is another algorithm widely used by machine learning people for  both classification as well as regression problems but is widely used for classification tasks. It is preferred over other classification algorithms because it uses less computation and gives notable accuracy. It is good because it gives reliable results even if there is less data. 

 

What Is Support Vector Machine (Svm)?

 

A support vector machine is a machine learning model that is able to generalise between two different classes if the set of labelled data is provided in the training set to the algorithm. The main function of the SVM is to check for that hyperplane that is able to distinguish between the two classes.

 

There can be many hyperplanes that can do this task but the objective is to find that hyperplane that has the highest margin that means maximum distances between the two classes, so that in future if a new data point comes that is two be classified then it can be classified easily. 

 

How Does Svm Works?

 

1. Linearly Separable Data

 

Let us understand the working of SVM by taking an example where we have two classes that are shown is the below image which are a class A: Circle & class B: Triangle. Now, we want to apply the SVM algorithm and find out the best hyperplane that divides the both classes.

Two classes circle and triangle.

Class A and B

 

Two classes and hyperplane that divides both.

Labelled Data

 

SVM takes all the data points in consideration and gives out a line that is called ‘Hyperplane’ which divides both the classes. This line is termed as ‘Decision boundary’. Anything that falls in circle class will belong to the  class A and vice-versa.

 

Best hyperplane with maximum distance.

All hyperplanes are not good at classification

 

There can be many hyperplanes that you can see but the best hyper plane that divides the two classes would be the hyperplane having a large distance from the hyperplane from both the classes. That is the main motive of SVM to find such best hyperplanes.

 

There can be different dimensions which solely depends upon the features we have. It is tough to visualize when the features are more than 3.

 

Data points.

Class A- Red & Class- B Yellow

 

Consider we have two classes that are red and yellow class A and B respectively. We need to find the best hyperplane between them that divides the two classes. 

 

Class A and Class B.

Soft margin and hyperplane

 

Soft margin permits few of the above data points to get misclassified. Also,it tries to make the balance back and forth between finding a hyperplane that attempts to make less misclassifications and maximize the margin.

 

2. Linearly Non-separable Data

 

Linearly Non-separable Data

 

If the data is non linearly separable as shown in the above figure then SVM makes use of kernel tricks to make it linearly separable. The concept of transformation of non-linearly separable data into linearly separable is called Cover’s theorem - “given a set of training data that is not linearly separable, with high probability it can be transformed into a linearly separable training set by projecting it into a higher-dimensional space via some non-linear transformation”. Kernel tricks help in projecting data point to the higher dimensional space by which they became relatively more easily separable in higher dimensional space.

 

Kernel Tricks: 

 

Kernel tricks also known as Generalized dot product. Kernel tricks are the way of calculating dot product of two vectors to check how much they make an effect on each other. According to Cover’s theorem the chances of linearly non-separable data sets becoming linearly separable increase in higher dimensions. Kernel functions are used to get the dot products to solve SVM constrained optimization.

 

SVM Kernel Functions:

 

Different SVM Kernel Function that are used

Kernel Functions | Creadit : Source

 

While using the svm classifier we can take the kernel as ‘linear’ , ’poly’ , ‘rbf’ , ‘sigmoid’. Let us see which are the most used kernels that are polynomial and rbf (Radial Basis Function). You can refer here for documentation that is present on sklearn.

 

  • Polynomial Kernel-  The process of generating  new features by using a polynomial combination of all the existing features.

  • Radial Basis Function(RBF) Kernel-  The process of generating new features calculating the distance between all other dots to a specific dot. One of the rbf kernels that is used widely is the Gaussian Radial Basis function.

 

Degree of tolerance in SVM 

 

The penalty term that is passed as a hyper parameter in SVM while dealing with both linearly separable and non linear solutions is denoted as ‘C’ that is called as Degree of tolerance. Large value of C results in the more penalty SVM gets when it makes a misclassification. The decision boundary will be dependent on narrow margin and less support vectors.

 

Pros of SVM

  • High stability due to dependency on support vectors and not the data points.

  • Does not get influenced by Outliers. 

  • No assumptions made of the datasets.

  • Numeric predictions problem can be dealt with SVM.

 

Cons of SVM

  • Blackbox method.

  • Inclined to overfitting method.

  • Very rigorous computation.

 

 

Hands On Problem Statement

 

The problem is to classify patients having tumors. The dataset is available on UCI Machine Learning Repository. We have taken a small dataset that is available on the Github repository. Here, SVM classifier is used from sklearn to do classification.

 

Code to import necessary library and data.

 

STEPS

  • Imported necessary libraries.
  • Imported dataset that is “tumor.csv”
  • EDA of the dataset.

 

Code for splitting the data and calculating accuracy.

 

  • Defined target and independent features.

  • Splitted the dataset using train_test-split from sklearn.

  • For evaluation of the model imported accuracy_score and confusion matrix from sklearn.metrics.

  • Initiated object for SVC that is svc_model and fitted the training data to the model. Used ‘linear’ as a kernel with a gamma of ‘1’.

  • Made a prediction on X_test and calculated accuracy score on training as well as test data that came out to be 97% and 96% respectively.

 

Code for different kernel accuracy and predictions.

 

  • Used different kernels like ‘rbf’, ‘poly’ and ‘sigmoid’ and calculated accuracy on each of them training as well as testing accuracy.

 

Different Kernel accuracy's.

Accuracy on training and testing data

 

Conclusion

 

In this blog, I have tried to explain to you about the support vector machine and how it works. I have talked about linearly as well as non linearly separable data, also discussed kernel tricks, kernel functions and degree of tolerance in SVM. At last I talked about the pros and cons of Support Vector Machine followed by a hands on problem statement on tumor dataset.

0%

Comments