Category
>Statistics

What is Probability Distribution Function?

Vineel Chandra
Jan 20, 2022

In the world of Statistics, the probability distribution provides the possibility of every outcome that comes as a result of a random experiment or an event.

A random experiment or event is defined as one whose outcomes cannot be foreseen. There are different events in probability like Independent, Mutually Exclusive, Exhaustive, Simple, Compound, Impossible, Sure etc.

The Probability distribution function tells us the probabilities associated with different occurrences or outcomes of a random event. For example, if you toss a coin, there are two outcomes possible - Head and Tail. The probability distribution function helps one determine the chances of head and tail separately.

Probability Distribution Function

The distribution of probabilities associated with all outcomes of an experiment is explained better by the Probability Distribution Function. These functions vary from distribution to distribution depending on the type.

For a certain random variable, probability distribution functions are employed as probability density functions in the case of continuous random variables. Types of random variables are explained in detail in the subsequent sections. (Source)

Any random variable can be visualized as a probability distribution. A random variable can be continuous, discrete or both. When the outcome values of a random experiment are plotted in a graph, the respective probability values are also visualized and these values follow a pattern. This is called Probability distribution which has a function.

It is like any other distribution having various statistics like mean, variance and standard deviation. To better understand the concept, I would like to take an example as follows:

Let’s continue with our previously taken example of tossing a coin. As an extension, I would like to say that the coin has been flipped 6 times. Let me put forth a few questions:

What is the probability of getting exactly 4 tails?
What is the probability of getting less than 5 tails?
What is the probability of getting more than 1 tail?

There is a mathematical way of representing these questions. It is as follows:

P (Probability of getting exactly 4 tails when a coin is tossed 6 times)
P (Probability of getting less than 5 tails when a coin is tossed 6 times)
P (Probability of getting more than 1 tail when a coin is tossed 6 times)

There can be various other scenarios possible. For instance, we might be interested to know the probability of fetching two numbers whose sum is 5 when two unbiased dice are rolled.

Let us simplify this example and discuss it further. Say only two coins are tossed one after the other. The sample space in this experiment can be represented as follows:

S = {HH, TT, HT, TH}

Sample space is nothing but the set of all possible outcomes of a random experiment or an event. Now a random variable can be defined as something like - Counting the number of heads or tails in each outcome. Suppose the random variable, denoted as Y, in this experiment is the number of heads that we got. The values of Y for each outcome are given below:

Y(HH) = 2, Y(HT) = 1, Y(TH) = 1, Y(TT) = 0.

Types of Probability Distribution Function

Continuous or Normal Probability Distribution
Discrete or Binomial Probability Distribution

Continuous Probability Distribution:

The normal or continuous probability distribution is also known as a cumulative probability distribution. As the name suggests, the values that are plotted on the graph are continuous in nature.

Be it complex numbers, rational numbers, positive or negative numbers, prime or composite numbers, whole or natural numbers, or a set of real numbers on a whole, are a part of continuous probability distributions.

A real-life example of a continuous outcome can be - Temperature in Hyderabad on a fine sunny day. Once we get the outcomes from a random experiment, depending on the nature of values, continuous probability distributions can be plotted by estimating probability numbers against each outcome value from the experiment.

The probability distribution table or graph is better explained by the probability density function. Its formula varies depending on the nature of the distribution.

Some examples of continuous probability distribution:

Weights of human beings across the country
Outcomes of rolling an unbiased or a biased dice
Outcomes of tossing a 1-rupee coin
Marks of students in an examination
Heights of newly born babies
Incomes of working humans in various industries and places
Sizes of men’s shoes or women’s chappals

Discrete Probability Distribution

It is also known as Binomial Probability Distribution or Probability Mass Function since discrete outcomes are produced in these types of experiments. The formula for Probability Mass Function is also related to the Binomial Theorem. Hence, the other name is Binomial Probability Distribution.

Some examples of discrete probability distribution:

To determine the number of defective and perfect products in the process of manufacturing
Positive and negative opinions of voters on election process and candidates who are contesting
To check the number of people who got vaccinated and who have not
The number of freshers and laterals working in an organization
The number of males and females in an educational institution
To estimate the number of people who watched a particular movie and who have not yet
To find the number of people who have smoking habits and those who are teetotallers

(Related reading: Types of Statistical Data Distribution Models)

Expected Value and Variance of Random Variable in Probability Distribution

Like in examples discussed in this article, we define a random variable and try to estimate the same using probability distributions. Each time we run an experiment, we get a different value of a random variable.

To better serve our purpose, we would be interested to know the average and possible deviation of the random variable so that we can make informed decisions with regard to the business problem.

A simple question can summarize this paragraph - What could be the mean value of my random variable if I perform the random experiment approximately 1000 times?

The Expected Value or Mean of the Probability Distribution can be calculated by performing the sum of products of possible outcomes and respective probabilities. This gives us the weighted average of random values that come as outcomes in a random experiment or event.

While mean is important to understand the central tendency of a probability distribution, we would also need to measure the dispersion or randomness to learn how the distribution is spread out.

That is when Variance comes into the picture. A single value is not enough to make accurate decisions, we would need a range of values to increase decision accuracy as explained by towards data science.

Solved Example of Probability Distribution

Suppose we flipped a coin twice in succession. Y is the random variable that determines the number of tails obtained. Present and explain the probability distribution of Y.

Solution:

The first and foremost step is to write the possible outcomes related to the given random variable. The possibilities of the random experiment are as follows:

No tail comes
One tail and one head come in any order
Two tails in two tosses of the coin

Now comes the mathematical part of the solution. We will now write probabilities against each outcome explained above as follows:

P(Y=0) = P(Head+Head) = (½)*(½) = ¼

P(Y=1) = P(Head+Tail) or P(Tail+Head) = (½)*(½) + (½)*(½) = ½

P(Y=2) = P(Tail+Tail) = (½)*(½) = ¼

If we write the above values in the form of a table:

Y	0	1	2
P(Y)	1/4	1/2	1/4

Applications of Probability Distribution Function

In Natural Language Processing (NLP) in the field of Machine Learning, programmers and statisticians associate the possibility of occurrence of a sentence, tagline or a word with some specific probability and for the same, probability distributions are drawn.
Natural calamities like volcanoes, tsunami, earthquakes can be forecasted in advance with the help of historical data and once a certain threshold probability is touched or crossed, governments can take necessary steps to be prepared in advance
The probability distribution function is used in the field of Mathematics, Physical Sciences and Physical Chemistry

Hospitals might be interested to know whether the medicine was effective or not. Data scientists working for them will be applying machine learning algorithms like Logistic Regression, Random Forest, Decision Trees etc. which make complete use of probability distribution functions.