Randomness is the soul of statistics, and by far, statistics play an important role in the development of data science and machine learning concepts. For example, we generate random samples, we assign random weights to artificial neural networks, we also split the data randomly into test and training datasets, and many more concepts from data science require random numbers and random samples.
In this article, we will walk you through generating random samples from different probability distributions and work with them. After completing this tutorial article, you will be able to understand how random samples can be generated through different probability distributions (discrete and continuous) as well as you will learn some additional things such as plotting the sampled random distributions.
(Must read: DATA TYPES in Python)
As my previous article also introduces, the random module/library is important to generate random numbers and random samples from different probability distributions (mostly continuous ones). You can read the article Working with Random Numbers in Python for connecting the dots from this article.
Besides, we are introducing a new module scipy.stats to generate random samples from discrete distributions such as poison, binomial, etc. Learn all types of data distribution models by following the link.
Importing these two modules along with the pyplot from matplotlib is simple and as shown below:
#importing random module in python environment import random as rnd #Importing scipy module in python environment import scipy.stats as scpy #Importing matplotlib module in python environment import matplotlib.pyplot as plt
The matplotlib.pyplot will help us in visualizing the distributions of random samples we are going to take.
(Also read: First Step Towards Python)
Well, to generate a random sample from a binomial distribution, we can use the binom.rvs() method from the scipy.stat module. This method takes n (number of trials) and p (probability of success) as parameters along with the size.
The size parameter allows you to restrict the sample points up to a specific number.
The syntax for the binom.rvs() method is as shown below: binom.rvs(n, p, size) Where, n - specifies the number of trials, p - specifies the probability or chance of success size - specifies the sample size default value as 1.
Now, let us take a simple example where we try to generate a random binomial sample of size 5, with parameters n = 12 and p = 0.6. Code is as shown below:
#importing the binom module from scipy.stats in python environment from scipy.stats import binom #Generating five random binomial numbers from a given distribution for i in range(5): rnd_binom = binom.rvs(n = 12, p = 0.6) print(rnd_binom)
Now, if we run the code above, we see the output as shown below
A random sample of five numbers from the binomial distribution
Note that, we could have used the size = 5 arguments and generate a random sample of five as well. However, it would have given us a list of five samples.
Now let us try to generate a random sample of 10,000 items and plot it using the pyplot module to see the distribution of the binomial variate.
#importing the binom module from scipy.stats in python environment from scipy.stats import binom #importing pyplot module as plt from matplotlib in python environment import matplotlib.pyplot as plt #Generating a random sample of size 10000 from binomial distribution with n = 12 and p = 0.6 binom_rnd_sample = binom.rvs(n = 12, p = 0.6, size = 10000) #Plotting the distribution using plt.hist method plt.hist(binom_rnd_sample, bins = 50)
Here, we are generating a random sample of size 10,000 from a binomial distribution with n = 12 and p = 0.6. Then, the plt.hist() method is used to generate a histogram out of the sample created. See the output as shown below:
Plotting a random binomial sample of size 10,000
You can also see various distributional graphs if you change the values for n and p altogether.
(Suggested read: Julia vs Python)
The Poisson distribution is one of the important distributions in statistics and is often called the distribution of rare events. This distribution fits to model the number of events happening in a given time span.
We have the poisson.rvs() method from the scipy.stats module which allows us to generate a Poisson random sample. This method takes the average event occurring rate (mu) at a given time, as usual size describes how many random variates can be captured through the distribution.
Let us see how to draw and plot a random sample from Poisson distribution in python.
#importing the poisson module from scipy.stats in python environment from scipy.stats import poisson #importing pyplot module as plt from matplotlib in python environment import matplotlib.pyplot as plt #Generating a random sample of size 10000 from poisson distribution with mean 4 pois_rnd_sample = poisson.rvs(mu = 4, size = 10000) #Plotting the distribution using plt.hist method plt.hist(pois_rnd_sample, bins = 50)
Here, we are generating a sample of 10,000 poisson random variates with a mean value of 4 and plotting those points to see if this sample follows the poisson properties. See the graph below:
A plot of 10,000 Poisson random variates with mean value 4
Well, we can use the standard random module to generate a random sample from the normal distribution. We have a function called normalvariate(). To generate a random sample from normal distribution, it is mandatory to provide the mean (mu) and the standard deviation (sigma) value under the normalvariate() function.
Let us generate a random sample of size 5 with mean zero and standard deviation 5. See the code below:
#Importing python module random to generate random numbers import random as rnd #Generating a random sample of 5 from normal distribution for i in range(5): rnd_norm = rnd.normalvariate(mu = 0, sigma = 5) print(rnd_norm)
The output is as shown below:
Random sample of 5 from the normal distribution with mean 0 and standard deviation 5
Well, interestingly, we can also draw a normal random sample through the scipy.stats module. The module has norm.rvs() method that allows us to generate a random sample from normal distribution. It has a loc parameter that specifies the mean value and scale parameter that specifies the sigma/standard deviation. Let us generate a random sample of size 10,000 and plot it. Code is as below:
#importing the norm module from scipy.stats in python environment from scipy.stats import norm #importing pyplot module as plt from matplotlib in python environment import matplotlib.pyplot as plt #Generating a random sample of size 10000 from binomial distribution with n = 12 and p = 0.6 normal_rnd_sample = norm.rvs(loc = 0, scale = 5, size = 10000) #Plotting the distribution using plt.hist method plt.hist(normal_rnd_sample, bins = 50)
The output plot of this code is as shown below:
Plotting random normal sample of 10,000 points with mean 0 and sigma 5
This is all we have for you in this article. If you have not checked our article about working with python JSON Objects, you can read it out here Working With Python JSON Objects. Closing this article with some summary points for you.
The scipy.stats module from python is a rich source with most of the statistical functions present in it. We can use the same module to generate random samples from different statistical distributions (both continuous and discrete)
The binom.rvs() method from the scipy.stat module is used to generate a random sample of any size from binomial distribution.
The poisson.rvs() method from the scipy.stats module is used to generate a random sample of any size from poisson distribution.
The normalvariate() method from module random can be used to generate a random sample of any size from Normal Distribution
The norm.rvs() method from the scipy.stats module can be used to generate a random sample of any size from Normal Distribution.
6 Major Branches of Artificial Intelligence (AI)READ MORE
Reliance Jio and JioMart: Marketing Strategy, SWOT Analysis, and Working EcosystemREAD MORE
Top 10 Big Data TechnologiesREAD MORE
8 Most Popular Business Analysis Techniques used by Business AnalystREAD MORE
Deep Learning - Overview, Practical Examples, Popular AlgorithmsREAD MORE
7 types of regression techniques you should know in Machine LearningREAD MORE
7 Types of Activation Functions in Neural NetworkREAD MORE
What Are Recommendation Systems in Machine Learning?READ MORE
Introduction to Time Series Analysis in Machine learningREAD MORE
How Does Linear And Logistic Regression Work In Machine Learning?READ MORE