Introduction to Linear Discriminant Analysis in Supervised Learning

  • Neelam Tyagi
  • Nov 28, 2019
  • Machine Learning
Introduction to Linear Discriminant Analysis in Supervised Learning title banner

With the advancement in technology and trends in connected-devices could consider huge data into account, their storage and privacy is a big issue to concern. Data hackers make algorithms to steal any such confidential information from a massive amount of data. So, data must be handled precisely which is also a time-consuming task. Also, we have seen, not all the data is required for inferences, reduction in data-dimensions can also help to govern datasets that could indirectly aid in the security and privacy of data. 


In the core aspects of this blog, we will dwell on data dimensionality reduction techniques, it will cover the concept of Linear Discriminant Analysis(LDA), the difference of LDA with other dimension reduction technique(PCA) and related applications.


Machine Learning is divided into three vast areas named Supervised learning, Unsupervised Learning and Reinforcement Learning.  In 1936, Ronald A.Fisher formulated Linear Discriminant first time and showed some practical uses as a classifier, it was described for a 2-class problem, and later generalized as ‘Multi-class Linear Discriminant Analysis’ or ‘Multiple Discriminant Analysis’ by C.R.Rao in the year 1948. 


Linear Discriminant Analysis is the most commonly used dimensionality reduction technique in supervised learning. Basically, it is a preprocessing step for pattern classification and machine learning applications. It projects the dataset into moderate dimensional-space with a genuine class of separable features that minimize overfitting and computational costs. 


With the aim to classify objects into one of two or more groups based on some set of parameters that describes objects, LDA has come up with specific functions and applications, we will learn about that in detail in the coming sections. 


Under Linear Discriminant Analysis, we are basically looking for 


  1. Which set of parameters can best describe the association of the group for an object?

  2. What is the best classification preceptor model that separates those groups?


It is widely used for modeling varieties in groups, i.e. distributing variables into two or more classes, suppose we have two classes and we need to classify them efficiently. 


Classification of randomly distributed objects based on some parameters, here you can observe how LDA classifies similar objects in one group and other objects in another group.

A view of the classification of various objects before and after implementing LDA


Classes can have multiple features, using one single feature to classify may yield in some kind of overlapping of variables, so there is a need of increasing the number of features to avoid overlapping that would result in proper classification in return. 


Consider another simple example of dimensionality reduction and feature extraction, you want to check the quality of soap based on the information provided related to a soap including various features such as weight and volume of soap, peoples’ preferential score, odor, color, contrasts, etc. A small scenario to understand the problem more clearly; 


  1. Object to be tested -Soap;

  2. To check the quality of a product- class category as ‘good’ or ‘bad’( dependent variable, categorical variable, measurement scale as a nominal scale);

  3. Features to describe the product- various parameters that describe the soap (independent variable, measurement scale as nominal, ordinal, internal scale);  


LDA is firmly used for classification tasks and features extractions, here an object is testified and its quality is checked based on some features.

Pictorial view of an object, class category, and features extraction


When the target variable or dependent variable is decided then other related information can be dragged out from existing datasets to check the effectivity of features on the target variables. And hence, the data dimension gets reduced out and important related-features have stayed in the new dataset. 



Difference between LDA and PCA


From the above discussion, we came to know that in general, the LDA approach is very similar to Principal Component Analysis( see the previous blog to get more information on PCA), both are linear transformation techniques for dimensionality reduction, but also pursuing some differences;


  • The earliest difference between LDA and PCA is that PCA can do more of features classification and LDA can do data classification.


  • The shape and location of a real dataset change when transformed into another space under PCA, whereas, there is no change of shape and location on transformation to different spaces in LDA. LDA only provides more class separability.


PCA and LDA have widely used dimensionality reduction techniques in machine learning, its fundamental difference is "deployment of PCA and LDA in unsupervised and supervised learning respectively

Flow chart showing the difference between LDA and PCA 


  • PCA can be expressed as an unsupervised algorithm since it avoids the class labels and focuses on finding directions( principal components) to maximize the variance in the dataset, in contrast to this, LDA is defined as supervised algorithms and computes the directions to present axes and to maximize the separation between multiple classes.  



Application of Linear Discriminant Analysis


There are various techniques used for the classification of data and reduction in dimension, among which Principal Component Analysis(PCA) and Linear Discriminant Analysis(LDA) are commonly used techniques. The condition where within -class frequencies are not equal, Linear Discriminant Analysis can assist data easily,  their performance ability can be checked on randomly distributed test data. This method results in the maximization of the ratio between-class variance to the within-class variance for any dataset and maximizes separability. 


Analytics Steps Sign Up Form


LDA has been successfully used in various applications, as far as a problem is transformed into a classification problem, this technique can be implemented. For example, LDA can be used as a classification task for speech recognition, microarray data classification, face recognition, image retrieval, bioinformatics, biometrics, chemistry, etc. below here are other applications of LDA;


  • For customers’ recognition: LDA helps here to identify and choose the parameters to describe the components of a group of customers who are highly likely to buy similar products.


  • For face recognition: it is the most famous application in the field of computer vision, every face is drawn with a large number of pixel values, LDA reduces the number of features to a more controllable number first before implementing the classification task. A temple is created with newly produced dimensions which are a linear combination of pixel values.


  • In medical: LDA is used here to classify the state of patients’ diseases as mild, moderate or severe based on the various parameters and the medical treatment the patient is going through in order to decrease the movement of treatment.


  • For predictions: LDA is firmly used for prediction and hence in decision making, “will you read a book” gives you a predicted result through one or two possible class as a reading book or not. 


  • In learning: Nowadays, robots are trained to learn and talk to work as human beings, this can be treated as classification problems. LDA makes similar groups based on various parameters such as frequencies, pitches, sounds, tunes, etc.





In this contribution, we have understood the introduction of Linear Discriminant Analysis technique used for dimensionality reduction in multivariate datasets. Recent technologies have to lead to the prevalence of datasets with large dimensions, huge orders, and intricate structures. 


Such datasets stimulate the generalization of LDA into the more deeper research and development field. In the nutshell, LDA proposes schemas for features extractions and dimension reductions. For more blogs in analytics and new technologies do read Analytics Steps.



  • 360digitmgas

    Jun 18, 2020

    Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing. data science course in coimbatore

    Neelam Tyagi

    Oct 13, 2020

    You are welcome, visit our website regularly for more updates