Have you come across a situation where you have so many variables and you are unable to understand the relationship between each different variable? You are in that situation where you can overfit your model to the data.
In these kinds of situations you need to lower down your feature space to understand the relationship between the variables that will result in less chances for overfitting. To reduce or lower down the dimension of the feature space is called “Dimensionality Reduction”. It can be achieved either by “Feature Exclusion” or by “Feature Extraction”.
Feature exclusion is about dropping variables and keeping only those features that can be used to predict the target whereas feature extraction is about extracting features from features. Suppose we have 5 independent features and we create 5 new features on the basis of old 5 features, this is the way features extraction works.
Principal Component analysis also known as PCA is such a feature extraction method where we create new independent features from the old features and from combination of both keep only those features that are most important in predicting the target. New features are extracted from old features and any feature can be dropped that is considered to be less dependent on the target variable.
Recommended blog: How to use the Random Forest classifier in Machine learning?
PCA is such a technique which groups the different variables in a way that we can drop the least important feature. All the features that are created are independent of each other.
The concept behind PCA is to go for accurate data representation in a lower dimensional space.
In both the pictures above, the data points (black dots) are projected to one line but the second line is closer to the actual points (less projection errors) than first one)
In the direction of largest variance the good line lies that is used for projection.
It is needed to modify the coordinate system so as to retrieve 1D representation for vector y after the data gets projected on the best line.
In the direction of the green line new data y and old data x have the same variance.
PCA maintains maximum variances in the data.
Doing PCA on n dimensions generates a new set of new n dimensions. Principal component takes care of the maximum variance in the underlying data 1 and the other principal component is orthogonal to it that is 2.
Case:1 When you want to lower down the number of variables, but you are unable to identify which variable you don't want to keep in consideration.
Case:2 When you want to check if the variables are independent of each other.
Case:3 When you are ready to make independent features less interpretable.
In above all the three cases you can use PCA.
Initially start with standardization of data.
Create a correlation matrix or covariance matrix for all the desired dimensions.
Calculate eigenvectors that are the principal component and respective eigenvalues that apprehend the magnitude of variance.
Arrange the eigen pairs in decreasing order of respective eigenvalues and pick the value which has the maximum value, this is the first principal component that protects the maximum information from the original data.
PCA is also used for reducing the dimensions.
According to the respective eigenvalues arrange the eigenvectors in descending order.
Plot the graph of cumulative eigen_values.
Eigen vectors that have no importance contributing towards total eigenvalues can be removed for the analysis.
Plot of PCA and Variance Ratio
The dataset on which we will apply PCA is the iris data set which can be downloaded from UCI Machine learning repository.
import pandas as pd import numpy as np from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA # importing ploting libraries import matplotlib.pyplot as plt from scipy.stats import zscore from sklearn import datasets iris = datasets.load_iris() X = iris.data X_std = StandardScaler().fit_transform(X) cov_matrix = np.cov(X_std.T) print('Covariance Matrix \n%s', cov_matrix)
X_std_df = pd.DataFrame(X_std) axes = pd.plotting.scatter_matrix(X_std_df) plt.tight_layout()
Scatter matrix of scaled data
eig_vals, eig_vecs = np.linalg.eig(cov_matrix) eigen_pairs = [(np.abs(eig_vals[i]), eig_vecs[ i, :]) for i in range(len(eig_vals))] tot = sum(eig_vals) var_exp = [( i /tot ) * 100 for i in sorted(eig_vals, reverse=True)] cum_var_exp = np.cumsum(var_exp) print("Cumulative Variance Explained", cum_var_exp)
plt.figure(figsize=(6 , 4)) plt.bar(range(4), var_exp, alpha = 0.5, align = 'center', label = 'Individual explained variance') plt.step(range(4), cum_var_exp, where='mid', label = 'Cumulative explained variance') plt.ylabel('Explained Variance Ratio') plt.xlabel('Principal Components') plt.legend(loc = 'best') plt.tight_layout() plt.show()
Principal components VS variance ratio
First three principal components explain 99% of the variance in the data.
The three PCA will have to be named because they represent composite of original dimensions
For the jupyter notebook file that contains the code of applying PCA on iris data set can be found here.
I will conclude the blog stating the importance of PCA. It plays a very unique and important role. In this blog, I have discussed the introduction to PCA, in what scenarios to make use of PCA. I have also stated the steps to do PCA, what are the performance and how it can be used for dimension reductionality. Discussed about the advantages and disadvantages for using PCA.
Reliance Jio and JioMart: Marketing Strategy, SWOT Analysis, and Working EcosystemREAD MORE
6 Major Branches of Artificial Intelligence (AI)READ MORE
Top 10 Big Data TechnologiesREAD MORE
Introduction to Time Series Analysis: Time-Series Forecasting Machine learning Methods & ModelsREAD MORE
What is the OpenAI GPT-3?READ MORE
7 types of regression techniques you should know in Machine LearningREAD MORE
8 Most Popular Business Analysis Techniques used by Business AnalystREAD MORE
How Does Linear And Logistic Regression Work In Machine Learning?READ MORE
7 Types of Activation Functions in Neural NetworkREAD MORE
Deep Learning - Overview, Practical Examples, Popular AlgorithmsREAD MORE