Category
>Statistics

What is Precision, Recall & F1 Score in Statistics?

Utsav Mishra
May 28, 2021

“There are things that are less talked about, then there are the ones most talked about, and in between there are a few, which are much talked about but too hard to understand, such critical topics are our main discussion during the blog.

Introduction

Let us start by considering a simple case, you got an email saying that you got a job in XYZ company and the package is on 30LPA, to credit the salary they need your account details. Here you must think first, whether the mail is genuine or not. You have two choices here, either to give your bank details or not.

All the two choices have their own errors, based on the decision made. We will talk about the errors after a while. But first, we need to know that this kind of case uses just our mental strength and decision-making processes or something else too?

Obviously, it uses something else too. That something else is the study of the data provided to you through which you could make a decision. Be it data science or statistics, the two choices are made over provided data by gaining some information from it that results in a good decision-making process.

The decisions here can lead to an error too and can be extremely risky. Before taking any kind of decision, one needs to calculate the risk involved, i.e the risk of either losing all of your money or of losing the job. Here kicks in the concept of errors. Let us try to know what errors are.

(Must check: 4 types of data in statistics)

Errors in Statistical Decision-Making

You can use hypothesis testing to see whether your data supports or contradicts your study predictions.

The null hypothesis assumes that there is no difference between groups or no association between variables in the population while doing hypothesis testing. It's always accompanied by an alternate hypothesis, which is your study forecast of a real difference between the two groups of variables.

For example, if we throw a tennis ball at a glass window, the null hypothesis will be that the window will break. The alternate hypothesis, however, will be that the ball will rebound without making any kind of damage.

(Related blog: What is the p-value?)

Now based on this hypothesis let us look at the two kinds of errors-

Type 1 and Type 2 Errors

Using your data and the findings of a statistical test, you decide whether the null hypothesis can be rejected. Because these decisions are dependent on probabilities, there is always the possibility of reaching an incorrect conclusion.

If your results are statistically significant, it signifies that if the null hypothesis is true, they are extremely improbable to occur. You would reject your null hypothesis in this scenario.

But, in some cases, it causes type 1 error. Type 1 error is also called false positive.

If the null hypothesis is correct, your findings have a high chance of occurring even if they do not demonstrate statistical significance. As a result, your null hypothesis is not rejected.

However, this might be a Type II error. This error is also known as a false negative.

(Most related: What is Confusion Matrix?)

Now again coming back to the case we considered,

If you believe in the job offer and send your account details, the decision is based on the assumption that the mail is genuine. If you are right, you might get a job but if the assumption is wrong then you must have fallen into the trap of online phishing.

The null hypothesis here is that the mail is a hoax. But if you go and believe in the mail and on the sender and the null hypothesis comes to be true then you will lose a lot of money. And this will fall under the category of type 1 error or false positive.

While if the mail wasn’t a hoax and you would have sent the details then, you would have committed a type 2 error or false negative.

In the same way, there are true positives and true negatives. Where the class is negative but the outcome is positive, it is a true negative, similarly, if the outcome is positive and the class is also positive then it is a true positive.

Now based on the concepts of errors we just talked about, let us dive into the world of two evaluation metrics known as precision and recall.

What is Precision?

Precision is the degree to which estimates from different samples are similar. The standard error, for example, is a precision metric. When the standard error is modest, estimates from different samples will be near in value; conversely, when the standard error is large, estimates from different samples will be far apart in value.

The standard error is inversely proportional to precision. Sample estimates are more exact when the standard error is modest; when the standard error is large, sample estimates are less exact.

Now, the question comes, how do we calculate precision?

The formula is simple, it goes like this-

Precision = True Positive/Actual Result, or,

Precision = True Positive/(True Positive + False Positive)

By the formula, we can find an easy definition of precision. It states that precision is the percentage of relevant results.

Another example to explain precision, let us suppose we search for a movie on Netflix, and we start getting irrelevant search results. Due to low search precision, we drop the idea of watching the movie. This low precision of one OTT platform might make us switch to another one, this is the reason why precision is important for any model.

Let us now move to another term stated before, Recall.

(Also check: 7 Types of Statistical Analysis)

What is Recall?

By identifying it as Positive, Recall estimates how many of the Actual Positives our model captures (True Positive). With the same reasoning, we know that when False Negative has a large cost, Recall will be the model metric we use to choose our best model.

The formula of recall is-

Recall= True Positive/(True Positive + False Negative)

By the formula, we get another simple definition of recall. It states that recall is the percentage of total relevant results correctly classified under the used algorithm.

For example in the case we considered, after giving the bank details in the mail, you call the bank and they tell you to state your last 10 such instances where you have given someone your bank account details.

Let us suppose there are 5 such instances, you narrate 10 to remember and the 5 correct ones. Here your precision might be 50 percent but according to the algorithm used you were able to recall 100 percent of instances.

Thus, precision and recall are used to calculate another simple metric known as the F1 score.

What is F1 Score?

Depending on the problem you're trying to solve, you could assign a higher priority to maximize precision or recall in most cases. However, there is a simpler statistic that takes both precision and recall into consideration, and you can seek to maximize this number to improve your model.

The F1-score is a statistic that is essentially the harmonic mean of precision and recall. The formula of the F1 score depends completely upon precision and recall.

The formula is-

F1 Score= (2*Precision *Recall)/(Precision + Recall)

Conclusion

Being the two most important mode evaluation metrics, precision and recall are widely used in statistics.

(Recommended blog: What is Descriptive Statistics?)

While precision refers to the percentage of relevant results that your algorithm successfully classifies, recall refers to the proportion of total relevant results that your algorithm successfully classifies. Unfortunately, maximizing both of these parameters at the same time is impossible, as one comes at the expense of the other.