Nowadays, Big Data and Data Science have become high volume keywords. They tend to become extensively researched and this makes this data to be processed and studied with scrutiny. One of the techniques to analyse this data is Descriptive Analysis.
This data needs to be analysed to provide great insights and influential trends that allows the next batch of content to be made in accordance to the general population’s liking or dis-liking.
The conversion of raw data into a form that will make it easy to understand & interpret, ie., rearranging, ordering, and manipulating data to provide insightful information about the provided data.
Descriptive Analysis is the type of analysis of data that helps describe, show or summarize data points in a constructive way such that patterns might emerge that fulfill every condition of the data.
It is one of the most important steps for conducting statistical data analysis. It gives you a conclusion of the distribution of your data, helps you detect typos and outliers, and enables you to identify similarities among variables, thus making you ready for conducting further statistical analyses.
Data aggregation and data mining are two techniques used in descriptive analysis to churn out historical data. In Data aggregation, data is first collected and then sorted in order to make the datasets more manageable.
Descriptive techniques often include constructing tables of quantiles and means, methods of dispersion such as variance or standard deviation, and cross-tabulations or "crosstabs" that can be used to carry out many disparate hypotheses. These hypotheses often highlight differences among subgroups.
Measures like segregation, discrimination, and inequality are studied using specialised descriptive techniques. Discrimination is measured with the help of audit studies or decomposition methods. More segregation on the basis of type or inequality of outcomes need not be wholly good or bad in itself, but it is often considered a marker of unjust social processes; accurate measurement of the different steps across space and time is a prerequisite to understanding these processes.
A table of means by subgroup is used to show important differences across subgroups, which mostly results in inference and conclusions being made. When we notice a gap in earnings, for example, we naturally tend to extrapolate reasons for those patterns complying.
But this also enters the province of measuring impacts which requires the use of different techniques. Often, random variation causes difference in means, and statistical inference is required to determine whether observed differences could happen merely due to chance.
A crosstab or two-way tabulation is supposed to show the proportions of components with unique values for each of two variables available, or cell proportions. For example, we might tabulate the proportion of the population that has a high school degree and also receives food or cash assistance, meaning a crosstab of education versus receipt of assistance is supposed to be made.
Then we might also want to examine row proportions, or the fractions in each education group who receive food or cash assistance, perhaps seeing assistance levels dip extraordinarily at higher education levels.
Column proportions can also be examined, for the fraction of population with different levels of education, but this is the opposite from any causal effects. We might come across a surprisingly high number or proportion of recipients with a college education, but this might be a result of larger numbers of people being college graduates than people who have less than a high school degree.
(Must check: 4 Types of Data in Statistics)
Descriptive analysis can be categorized into four types which are measures of frequency, central tendency, dispersion or variation, and position. These methods are optimal for a single variable at a time.
Different types of Descriptive Analysis
In descriptive analysis, it’s essential to know how frequently a certain event or response is likely to occur. This is the prime purpose of measures of frequency to make like a count or percent.
For example, consider a survey where 500 participants are asked about their favourite IPL team. A list of 500 responses would be difficult to consume and accommodate, but the data can be made much more accessible by measuring how many times a certain IPL team was selected.
In descriptive analysis, it’s also important to find out the Central (or average) Tendency or response. Central tendency is measured with the use of three averages — mean, median, and mode. As an example, consider a survey in which the weight of 1,000 people is measured. In this case, the mean average would be an excellent descriptive metric to measure mid-values.
Sometimes, it is important to know how data is divided across a range. To elaborate this, consider the average weight in a sample of two people. If both individuals are 60 kilos, the average weight will be 60 kg. However, if one individual is 50 kg and the other is 70 kg, the average weight is still 60 kg. Measures of dispersion like range or standard deviation can be employed to measure this kind of distribution.
Descriptive analysis also involves identifying the position of a single value or its response in relation to others. Measures like percentiles and quartiles become very useful in this area of expertise.
Apart from it, if you’ve collected data on multiple variables, you can use the Bivariate or Multivariate descriptive statistics to study whether there are relationships between them.
In bivariate analysis, you simultaneously study the frequency and variability of two different variables to see if they seem to have a pattern and vary together. You can also test and compare the central tendency of the two variables before carrying out further types of statistical analysis.
Multivariate analysis is the same as bivariate analysis but it is carried out for more than two variables. Following 2 methods are for bivariate analysis.
In a contingency table, each cell represents the combination of the two variables. Naturally, an independent variable (e.g., gender) is listed along the vertical axis and a dependent one is tallied along the horizontal axis (e.g., activities). You need to read “across” the table to witness how the two variables i.e. independent and dependent variables relate to each other.
Group |
0–4 |
5–8 |
9–12 |
13–16 |
17+ |
Men |
33 |
68 |
37 |
23 |
22 |
Women |
36 |
48 |
44 |
83 |
25 |
A table showing a tally of different gender with number of activities
A scatter plot is a chart that enables you to see the relationship between two or three different variables. It’s a visual rendition of the strength of a relationship.
In a scatter plot, you are supposed to plot one variable along the x-axis and another one along the y-axis. Each data point is denoted by a point in the chart.
The scatter plot shows the hours of sleep needed per day by age,Source
(Recommend Blog: Introduction to Bayesian Statistics)
High degree of objectivity and neutrality of the researchers are one of the main advantages of Descriptive Analysis. The reason why researchers need to be extra vigilant is because descriptive analysis shows different characteristics of the data extracted and if the data doesn’t match with the trends then it will lead to major dumping of data.
Descriptive analysis is considered to be more vast than other quantitative methods and provide a broader picture of an event or phenomenon. It can use any number of variables or even a single number of variables to conduct a descriptive research.
This type of analysis is considered as a better method for collecting information that describes relationships as natural and exhibits the world as it exists. This reason makes this analysis very real and close to humanity as all the trends are made after research about the real-life behaviour of the data.
It is considered useful for identifying variables and new hypotheses which can be further analyzed through experimental and inferential studies. It is considered useful because the margin for error is very less as we are taking the trends straight from the data properties.
This type of study gives the researcher the flexibility to use both quantitative and qualitative data in order to discover the properties of the population.
For example, researchers can use both case study which is a qualitative analysis and correlation analysis to describe a phenomena in its own way. Using the case studies for describing people, events, institutions enables the researcher to understand the behavior and pattern of the concerned set to its maximum potential.
In the case of surveys which consist of one of the main types of Descriptive Analysis, the researcher tends to gather data points from a relatively large number of samples unlike experimental studies that generally need smaller samples.
This is an out and out advantage of the survey method over other descriptive methods that it enables researchers to study larger groups of individuals with ease. If the surveys are properly administered, it gives a broader and neater description of the unit under research.
(Also check: Importance of Statistics for Data Science)
6 Major Branches of Artificial Intelligence (AI)
READ MOREReliance Jio and JioMart: Marketing Strategy, SWOT Analysis, and Working Ecosystem
READ MORETop 10 Big Data Technologies
READ MORE8 Most Popular Business Analysis Techniques used by Business Analyst
READ MOREDeep Learning - Overview, Practical Examples, Popular Algorithms
READ MORE7 Types of Activation Functions in Neural Network
READ MOREWhat Are Recommendation Systems in Machine Learning?
READ MORE7 types of regression techniques you should know in Machine Learning
READ MOREIntroduction to Time Series Analysis in Machine learning
READ MOREHow Does Linear And Logistic Regression Work In Machine Learning?
READ MORE
Comments