Crash Course in Statistics

  • Riya Kumari
  • Dec 19, 2020
  • Statistics
Crash Course in Statistics title banner

Crash Course is like a quick recap for any course as it will save your time and teach you the whole syllabus in a short span of time and easily. The crash course is a great support to the students who have to cover a lot of syllabus in a short period as it lets students learn smart and fast. This blog is the guide for students who want to study statistics in a short period. 


Crash Course in Statistics is the beginner's training of the primary methods of statistics. These key statistical methods include descriptive statistics, t-test, and many more. This crash course in statistics will help you understand statistics better and clear all your queries. 


Here we will discuss the major topics that need to be studied starting from basics of probability, an introduction to statistical inference, types of statistics, the t-test, nonparametric statistics, and several definitions of some important topics of statistics.


At the end of the blog, you will find some questions as a test. In this way, readers can test themselves about their basic knowledge of statistics. For students who are new to statistics and people who need to be a consumer of statistics, this "crash course" can prove to be very beneficial. 


1. Basics of Probability


A probability is a number that indicates the opportunity or probability of a specific happening. Probabilities can be communicated as extents that range from 0 to 1, and they can likewise be communicated as rates going from 0% to 100%. 


A probability of 0 demonstrates that there is zero chance that a specific occasion will happen, though a probability of 1 shows that an occasion is sure to happen. A probability of 0.45 or 45% demonstrates that there are 45 possibilities out of 100 of the occasion happening. (Must read: Introduction to Probability Distributions)


Common Terms Under Probability


  • Trials- Trials are also known as experiments or observations (multiple trials). Trials imply an occasion whose result is unknown and probability is related to the result of trials.


  • Sample Space or S- The probability of the sample space is always 1. It sets all possible elementary results of a trial and if the trial comprises flipping a coin two times, the sample space is S = (h, h),(h, t),(t, h),(t, t).


  • Events or E- An event is the specification of the result of a trial and it consists of a sole result or a set of results. The probability of an event is always between 0 and 1 and the probability of an event and its complement is always 1.



2. Introduction to Statistical Inference


Statistical inference is the way towards utilizing information examination to make determinations about a populace or cycle past the current information. Inferential statistical examination deduces the properties of a populace by testing theories and inferring gauges. For instance, you may review a portrayal of individuals in a district and, utilizing statistical principles including simulation and probability theory, make certain inferences dependent on that example. 


(Also read: Introduction to Statistical Data Analysis)


In simple words, we can assume that statistical inference is used to make comments about a community based upon data from a sample. 


Basic Terms Under Statistical Inference


  • Errors


Measurement error is sometimes called observational error. It is the difference between a measured quantity and its true value. It comprises random errors which are normally occurring errors that are to be predicted with any experiment and systematic errors which are induced by a miscalibrated device that influences all measurements.


  • Reliability


Reliability is a proportion of the steadiness or consistency of grades. You can moreover consider it as the capacity for a test or exploration discoveries to be repeatable. For instance, a clinical thermometer is a dependable device that would gauge the right temperature each time it is utilized. 


Similarly, a dependable number related test will precisely quantify numerical information for each understudy who takes it and solid exploration discoveries can be recreated again and again. Obviously, it's not exactly as basic as saying you might suspect a test is reliable and there are numerous reliable tools you can use to gauge reliability.


  • Validity


Validity is a crucial element of choosing a survey instrument. In simple words, we can say that it is simply a test or instrument that accurately measures what it’s supposed to do. In research, there are three ways to approach validity which are content validity, construct validity, and criterion-related validity.


  • Types of data


There are two types of data: numerical data and categorical data. Now, if you are thinking where is quantitative data? Then let us make it clear that numerical data is also known as quantitative data. 


Numerical data imply a measurement, like a person’s height, weight, or IQ. It is divided into two parts: discrete and continuous.


  1. Discrete data- It is the data that can be counted. The list of logical values may be fixed or you can call it finite or it might go from 0, 1, 2, on to infinity.

  2. Continuous data- It is just the opposite of discrete, that is it is infinite or you can say that their possible values cannot be figured. It can only be interpreted using intervals on the real number line.


3.  Types of Statistics


There are two types of statistics: descriptive and inferential statistics and here we will read in detail about these two.


  • Descriptive Statistics


Descriptive statistics is a simple way to define our data. It helps in portraying and comprehending the features of a particular informational collection by giving short synopsis about the example and proportions of the information. The most perceived types of descriptive statistics are measures of centre: the mean, median, and mode, which are utilized at practically all degrees of math and measurements. Thus, people use descriptive statistics to repurpose hard-to-understand quantitative understandings across a huge data set into bite-sized explanations. (Read also: What is Statistics?)


There are two categories under this, first 'The measure of central tendency', which is used to demonstrate the centre point or certain significance of a sample set or data set. Next is 'The measure of variability', which is used to describe variability in a sample or community.



  • Inferential Statistics


Inferential statistics is a type of statistics used to explain the importance of Descriptive statistics. That suggests once the data has been accumulated, analyzed, and summarized then we utilize these subtleties to depict the significance of the assembled data. 


There are a few types of inferential statistics that are used broadly and are very easy to decipher. It grants you to make desires by accepting a little model rather than working on the whole populace. For example, One-sample test of difference/One sample hypothesis test, Contingency Tables and Chi-Square Statistic, T-test, and so on.


(Recommended blog: Introduction to Bayesian Statistics)



4. What is T-Test?


Here, we will read in detail about the T-test which is a type of inferential statistics. The T-test is used to decide whether there is a critical distinction between the methods for two gatherings, which might be connected in specific details. 


To decide the statistical significance a t-test looks at the t-statistics, the t-distribution values, and the degrees of freedom, and to direct a test with three or more means, one should use an analysis of variance. 


It is generally used when the data set, like the data set recorded as the result from flipping a coin 100 times, would follow an ordinary circulation and may have obscure changes. A t-test is used as a hypothesis testing tool, which permits testing of a suspicion relevant to a populace.


Therefore, figuring a t-test requires three key information esteems. They incorporate the contrast between the mean values from every data set which is sometimes called a mean difference, the standard deviation of each faction, and the number of data values of each faction. Various types of t-test can be conducted relying on the data and also the type of analysis needed.


T-test Assumptions


  • The main assumption made with respect to t-tests concerns the scale of measurement. The assumption for a t-test is that the scale of measurement applied to the information gathered follows a constant or ordinal scale, for example, the scores for an IQ test.


  • The subsequent assumption made is that of a basic arbitrary example, that the information is gathered from a delegate, randomly chosen segment of the complete populace.


  • The third supposition that is the information, when plotted, brings about an ordinary appropriation, bell-shaped distribution curve.


  • The fourth assumption is a sensibly huge sample size is utilized. A bigger sample size implies the distribution of results should move toward an ordinary bell-shaped curve.


  • The last assumption is the homogeneity of variance. Homogeneous, or equivalent, change exists when the standard deviations of tests are around equivalent.


5. What are Nonparametric Statistics?


Nonparametric statistics allude to a factual strategy wherein the information is not expected to come from endorsed models that are dictated by few boundaries; instances of such models incorporate the normal distribution model and the linear regression model. Sometimes it utilizes information that is ordinal, which means it doesn't depend on numbers, instead of on positioning or request of sorts. This kind of survey is always best suited when assuming the order of something, where even if the numerical data changes, the outcomes.


(Recommended read: What is Vital Statistics? Types, Uses, and Examples)


Nonparametric descriptive statistics, statistical models, inference, and statistical tests, all come under nonparametric statistics. The model structure of nonparametric models isn't indicated from the earlier, however, is rather decided from the information. The term nonparametric isn't intended to suggest that such models need boundaries, yet rather than the number and nature of the boundaries are adaptable and not fixed ahead of time. Thus, a histogram is an illustration of a nonparametric gauge of the probability distribution.


(Recommend blog: Top 5 Statistical Data Analysis Techniques)


So, for more clearance have a look at an example, assume a monetary investigator who wishes to appraise the value at risk or VaR of a venture. The analyst accumulates income information from 100's of comparative ventures throughout a comparative time horizon. Instead of assuming that the income follows an ordinary distribution, he uses the histogram to gauge the distribution nonparametrically. The fifth percentile of this histogram at that point gives the investigator a nonparametric gauge of VaR.


Questionnaire: Analyze Yourself


  1. What is the probability in statistics?

  2. What is the reliability?

  3. How to calculate mean, median, and mode?

  4. Describe the types of data.

  5. What is descriptive statistics?

  6. Write five assumptions of the t-test.

  7. Explain nonparametric statistics with examples.