You must be familiar with several terms of statistics and simple linear regression is one of them. This blog talks about the concept of simple linear regression. Simple linear regression is a statistical strategy that permits us to sum up and study connections between two continuous or quantitative variables: One variable, meant A, is viewed as the predictor, explanatory, or independent variable. The other variable, meant B, is viewed as the response, outcome, or dependent variable.
In statistics, a simple linear regression model uses a single variable to predict the result of the other variable. You may confuse it with linear regression but let us make it clear through this blog that they both are different.
This blog carries the basics concept of simple linear regression such as the definition of linear regression, types of linear regression, the definition of simple linear regression, its assumptions, how to perform linear regression, limits of linear regression, examples of simple linear regression, and many more. Let’s begin with the basics of linear regression.
Definition of Linear Regression
In layman terms, we can define linear regression as it is used for learning the linear relationship between the target and one or more forecasters, and it is probably one of the most popular and well inferential algorithms in statistics. Linear regression endeavours to demonstrate the connection between two variables by fitting a linear equation to observed information. One variable is viewed as an explanatory variable, and the other is viewed as a dependent variable.
Recommended blog: How Does Linear And Logistic Regression Work In Machine Learning?
For instance, a modeller should relate loads of people to their heights utilizing a linear regression model. Thus, it is a crucial and generally used type of foreseeing examination.
Types of Linear Regression
Normally, linear regression is divided into two types: Multiple linear regression and Simple linear regression. So, for better clearance, we will discuss these types in detail.
Multiple Linear Regression
In this type of linear regression, we always attempt to discover the relationship between two or more independent variables or inputs and the corresponding dependent variable or output and the independent variables can be either continuous or categorical.
This linear regression analysis is very helpful in several ways like it helps in foreseeing trends, future values, and moreover predict the impacts of changes.
Simple Linear Regression
In simple linear regression, we aim to reveal the relationship between a single independent variable or you can say input, and a corresponding dependent variable or output. We can discuss this in a simple line as y = β0 +β1x+ε
Here, Y speaks to the output or dependent variable, β0 and β1 are two obscure constants that speak to the intercept and coefficient that is slope separately, and the error term is ε Epsilon.
We can also discuss this in the form of a graph and here is a sample simple linear regression model graph. Thus, in this whole blog, you will get to learn so many new things about simple linear regression in detail.
Simple Linear Regression graph, Image Source
What Actually is Simple Linear Regression?
It can be described as a method of statistical analysis that can be used to study the relationship between two quantitative variables. (Read also: Statistical Data Analysis)
Primarily, there are two things which can be found out by using the method of simple linear regression:
Strength of the relationship between the given duo of variables. (For example, the relationship between global warming and the melting of glaciers)
How much the value of the dependent variable is at a given value of the independent variable. (For example, the amount of melting of a glacier at a certain level of global warming or temperature)
Regression models are used for the elaborated explanation of the relationship between two given variables. There are certain types of regression models like logistic regression models, nonlinear regression models, and linear regression models. The linear regression model fits a straight line into the summarized data to establish the relationship between two variables.
(Also read: What is Statistics? Types, Variance, and Bayesian Statistics)
Assumptions of Linear Regression
To conduct a simple linear regression, one has to make certain assumptions about the data. This is because it is a parametric test. The assumptions used while performing a simple linear regression are as follows:
Homogeneity of variance (homoscedasticity)- One of the main predictions in a simple linear regression method is that the size of the error stays constant. This simply means that in the value of the independent variable, the error size never changes significantly.
Independence of observations- All the relationships between the observations are transparent, which means that nothing is hidden, and only valid sampling methods are used during the collection of data.
Normality- There is a normal rate of flow in the data.
These three are the assumptions of regression methods.
(Must read: Types of Regression Techniques in Machine Learning)
However, there is one additional assumption that has to be taken into consideration while specifically conducting a linear regression.
The line is always a straight line- There is no curve or grouping factor during the conduction of a linear regression. There is a linear relationship between the variables (dependent variable and independent variable). If the data fails the assumptions of homoscedasticity or normality, a nonparametric test might be used. (For example, the Spearman rank test)
Example of data that fails to meet the assumptions: One may think that cured meat consumption and the incidence of colorectal cancer in the U.S have a linear relationship. But later on, it comes to the knowledge that there is a very high range difference between the collection of data of both the variables. Since the homoscedasticity assumption is being violated here, there can be no linear regression test. However, a Spearman rank test can be performed to know about the relationship between the given variables.
Applications of Simple Linear Regression
Marks scored by students based on number of hours studied (ideally)- Here marks scored in exams are independent and the number of hours studied is independent.
Predicting crop yields based on the amount of rainfall- Yield is a dependent variable while the measure of precipitation is an independent variable.
Predicting the Salary of a person based on years of experience- Therefore, Experience becomes the independent while Salary turns into the dependent variable.
Limitations of Simple Linear Regression
Indeed, even the best information doesn't recount a total story. Regression investigation is ordinarily utilized in examination to set up that a relationship exists between variables. However, correlation isn't equivalent to causation: a connection between two variables doesn't mean one causes the other to occur. Indeed, even a line in a simple linear regression that fits the information focuses well may not ensure a circumstances and logical results relationship.
Utilizing a linear regression model will permit you to find whether a connection between variables exists by any means. To see precisely what that relationship is and whether one variable causes another, you will require extra examination and statistical analysis.
Examples of Simple Linear Regression
Now, let’s move towards understanding simple linear regression with the help of an example. We will take an example of teen birth rate and poverty level data.
This dataset of size n = 51 is for the 50 states and the District of Columbia in the United States (poverty.txt). The variables are y = year 2002 birth rate for each 1000 females 15 to 17 years of age and x = destitution rate, which is the percent of the state's populace living in families with wages underneath the governmentally characterized neediness level. (Information source: Mind On Statistics, 3rd version, Utts and Heckard).
Below is the graph (right image) in which you can see the (birth rate on the vertical) is indicating a normally linear relationship, on average, with a positive slope. As the poverty level builds, the birth rate for 15 to 17-year-old females will in general increment too.
Example graph of simple linear regression
Here is another graph (left graph) which is showing a regression line superimposed on the data.
The condition of the fitted regression line is given close to the highest point of the plot. The condition should express that it is for the "average" birth rate (or "anticipated" birth rate would be alright as well) as a regression condition portrays the normal estimation of y as a component of at least one x-variables. In statistical documentation, the condition could be composed y^=4.267+1.373x.
The interpretation of the slope (value = 1.373) is that the 15 to 17-year-old birth rate increases 1.373 units, on average, for each one unit (one per cent) increase in the poverty rate.
The translation of the intercept (value=4.267) is that if there were states with a population rate = 0, the anticipated normal for the 15 to 17-year-old birth rate would be 4.267 for those states. Since there are no states with a poverty rate = 0 this understanding of the catch isn't basically significant for this model.
In the chart with a repression line present, we additionally observe the data that s = 5.55057 and r2 = 53.3%.
The estimation of s discloses to us generally the standard deviation of the contrasts between the y-estimations of individual perceptions and expectations of y dependent on the regression line. The estimation of r2 can be deciphered to imply that destitution rates "clarify" 53.3% of the noticed variety in the 15 to 17-year-old normal birth paces of the states.
The R2 (adj) value (52.4%) is a change in accordance with R2 dependent on the number of x-variables in the model (just one here) and the example size. With just a single x-variable, the charged R2 isn't significant.
Simple linear regression is a regression model that figures out the relationship between one independent variable and one dependent variable using a straight line.
(Also read: Linear, Lasso & Ridge, and Elastic Net Regression)
Hence, the simple linear regression model is represented by: y = β0 +β1x+ε. This blog carries all the basic facts related to simple linear regression, hope so this blog helps you in finding all your answers to your queries. Stay tuned with us to learn more about topics of statistics and many more.(Must read: XGBoost Algorithm for Classification and Regression)