• Category
  • >Statistics

Analysis of Variance (ANOVA): Types and Limitations

  • Yashoda Gandhi
  • Mar 09, 2022
Analysis of Variance (ANOVA): Types and Limitations title banner

Do you want to try out a new technique or buy a new product but aren't sure how it stacks up against the competition? This is an all-too-familiar scenario for most of us. The majority of the options sound similar to one another, making it difficult to choose the best option.


Consider the following scenario: we have three medical treatments to use on patients who have similar diseases. One strategy is to assume that the treatment that took the least amount of time to cure the patients is the best once the test results are in. 


What if some of these patients were already partially cured or receiving treatment from somewhere else? In order to make a confident and reliable decision, we will need evidence to back up our position. The ANOVA concept comes in handy in this situation.


(Also Read - Applications of Statistical Techniques)



What is Analysis of variance (ANOVA)?


The term "ANOVA" refers to a technique that compares samples based on their means. ANOVA compares the means of different samples to see how one or more factors influence the outcome.


The analysis of variance (ANOVA) is a statistical test for determining whether or not two groups differ. This tool examines the disparity between expected and actual results. An ANOVA test divides the variability found within any data set into two sections to accomplish this:


1.   Factors with a statistical impact on the data set are known as systematic factors.


2.   Factors that have no statistical significance are known as random factors.


When you perform a variance analysis, you can see how much the independent variables influence the dependent variable. To put it another way, an analysis of variance is used to determine the significance of an experiment's results. 


Types of ANOVA


There will be only one assignable reason for data sub-divide if only one factor (different categories of a single factor) affects the response variable's values, and the corresponding analysis will be known as One-Way Analysis of Variance. This is where the Ventura Sales example comes in. 


Other examples include examining the differences in analytical ability among students from various subject streams (such as engineering graduates, management graduates, and statistics graduates); the impact of various advertising modes on consumer durables brand acceptance, and so on.


If we consider the effect of more than one assignable cause (different categories of multiple factors) on the response variable, the corresponding analysis is known as N-Way ANOVA (N>=2). When the impact of two factors (with multiple categories) on the dependent (response) variable is considered, a two-way ANOVA is used.


 If one more factor, 'type of outlet' (Rural and Urban), is added to the geographical-regions (Northern, Eastern, Western, and Southern) in the Ventura Sales, the corresponding analysis will be Two-Way ANOVA. 


More examples include examining the differences in analytical ability among students from various subject streams and geographical locations; the impact of various modes of advertisements and occupations on consumer durables brand acceptance, and so on.


(Also Read - Introduction to Statistical Data Analysis)


Further classification of two way ANOVA


  1. Two-way ANOVA with one observation per cell: each cell will only have one observation (combination). Assume we have two factors A (m categories) and B (n categories) (having n categories). 


As a result, there will be N= m*n total observations, with one observation (data-point) in each (Ai Bj) cell (combination), where i=1, 2,......, m and j= 1, 2,.....n. The impact of the two factors can be investigated here.


  1. Multiple observations per cell in a two-way ANOVA: each cell will have multiple observations (combination). The interaction effect of two factors can be investigated in addition to the effect of two factors. 


An interaction effect occurs when the impact of one factor (assignable cause) is dependent on the category of another assignable cause (factor), and so on. 


For examining interaction-effects, each cell (combination) must have more than one observation, which may not be possible in the earlier Two-Way ANOVA with one observation per cell.


(Also Read - Types of Statistical Data Distribution Models)



One way ANOVA vs Two way ANOVA


One-way (or unidirectional) and two-way ANOVA are the two main types of ANOVA. There are also different types of ANOVA. For example, MANOVA (multivariate ANOVA) differs from ANOVA in that the former assesses multiple dependent variables simultaneously while the latter assesses only one. 


The number of independent variables in your analysis of the variance test determines whether it is one-way or two-way. The impact of a single factor on a single response variable is assessed using a one-way ANOVA. 


It determines if all of the samples are identical. The one-way ANOVA is used to see if there are any statistically significant differences in the means of three or more unrelated groups.


The one-way ANOVA is expanded into a two-way ANOVA. One independent variable influences a dependent variable in a one-way analysis. There are two independent variables in a two-way ANOVA. 


A two-way ANOVA, for example, allows a business to compare worker productivity based on two independent variables like salary and skill set. It's used to look at how the two factors interact and to test the effect of two factors at the same time.


(Also Read - Simple Linear Regression)



Limitations of ANOVA


  1. Because it is designed to test all alternatives to the null hypothesis, it may be ineffective when used to test a single hypothesis.


  1. When losses are proportional to the square of the differences among the unknown population means, it is optimal; otherwise, it may not be optimal. 


When losses are proportional to the absolute values of differences among unknown population means, for example, expected losses are minimized using a test that uses the absolute values of differences among sample means.


  1. It is designed to be used with data from a normal distribution, and while it is extremely reliable, it may not produce exact p-values when the data comes from distributions with heavier tails than the normal. 


When the observations are drawn from non-normal distributions, even when the analysis of variance yields almost exact p-values, it may be less powerful than the corresponding permutation test.


Also Read | Waterfall Methodology



Applications of ANOVA in Real-Life

Interestingly, ANOVA has many applications in real life. We’ve listed some of these applications below :


  1. For Fertilizers


A large-scale farm is trying to figure out which of three fertilizers produces the highest crop yield. They apply each fertilizer to ten different fields and then calculate the total yield at the end of the season. 


To see if there is a statistically significant difference in the mean yield that results from these three fertilizers, researchers can use a one-way ANOVA with "type of fertilizer" as the factor and "crop yield" as the response. After that, we can run post hoc tests to see which fertilizer resulted in the highest mean yield.


  1. For Medications


The goal of the study is to see if four different medications cause patients' mean blood pressure to drop in different ways. They randomly assigned 20 patients to use each medication for one month, then measured blood pressure before and after they started taking it to determine the medication's average blood pressure reduction.


 To see if there is a statistically significant difference in the mean blood pressure reduction that these medications cause, researchers can use a one-way ANOVA with "type of medication" as the factor and "blood pressure reduction" as the response.


  1. For Advertising


A supermarket is curious about the effects of three different types of advertisements on average sales. They use each type of advertisement in ten different stores for a month, and at the end of the month, they track total sales for each store. 


To see if there is a statistically significant difference in mean sales between these three types of advertisements, researchers can use a one-way ANOVA with "type of advertisement" as the factor and "sales" as the response variable.


 If the overall p-value of the ANOVA is less than our significance level, we can conclude that there is a statistically significant difference in mean sales between the three types of advertisements.


  1. For Biologists


Biologists want to know how different levels of sunlight exposure (no sun, low sun, medium sun, high sun) and watering frequency (daily, weekly) affect plant growth. 


They will use a two-way ANOVA to see if either factor has a significant impact on plant growth and if the two factors are related in this case because there are two factors (level of sunlight exposure and water frequency). Using this information, biologists can better understand which level of sunlight exposure and/or watering frequency leads to optimal growth. 


(Also Read - Conformance Testing)



Assumptions of ANOVA


Though it was discussed in the conceptual section, it is important to reiterate that the following assumptions must be met:


  1. The populations from which the samples were taken should have a normal distribution.


  1. The samples were chosen at random and on their own.


  1. Each group should have the same variance, i.e., the variability in the dependent variable values within different groups should be equal.


  1. It's worth noting that minor deviations in assumptions have no effect on the Linear Model used in ANOVA, especially if the sample size is large.


  1. The Shapiro-Wilk Test and the Kolmogorov-Smirnov Test with Lilliefors Significance Correction can be used to check the Normality Assumption.


(Also Read - Z-test vs T-test)


You'd use ANOVA to figure out how your various groups react, with the null hypothesis being that the means of the various groups are equal. If the difference between the two populations is statistically significant, then the two populations are unequal (or different).