adplus-dvertising

Advanced Data Visualizations in R Programming

  • Lalit Salunkhe
  • Nov 15, 2020
  • R Programming
Advanced Data Visualizations in R Programming title banner

Through the last article, we tried putting some light on basic visualizations in R programming namely, bar plots, histograms, box plots, and scatter plots. We tried to unfold these four with some examples and customizations for each one. 

 

Through this article, we will try moving forward and cover the advanced visualizations in R programming. Well, frankly speaking, if you want to differentiate between basic and advanced visualizations, the word itself marks the difference as the advanced visualizations are having some features that are not present in the visualizations that we cover in the previous article.

 

The advanced visualizations we will cover throughout this article are as follows:

 

  1. Heat Map

  2. Mosaic Plot

  3. Map Visualization

  4. Correlogram Visualization

 

Let us see each one of these advanced visualizations one by one with examples


 

What is Heat Map?

 

A heat map is a technique of visualizing the data in a different way than conventional/traditional histograms or bar charts. Instead of using numbers, the heatmap uses colours to represent the information about the savoriness of a phenomenon. 

 

For example, the areas that are highly affected by COVID-19 throughout the world can be seen as in red colour and these colour scales are being deviated from dark to faint based on the severity of COVID-19 in respective areas (This though is not an example I would be showing you for hands-on). 

 

Let us use the mtcars (a-built in a dataset from base R) to generate a nice and beautiful heat map under R programming. The first step that is mandatory to do here is, standardizing the data. The scale() function from base R will help us standardize the data. This function will centre the entire data towards its mean value (this can be achieved by subtracting the mean of all observations from each observation) and/or also scales the same.


#standardizing the data using scale() function

my_data <- scale(mtcars)

This step is important as the data itself contains different variables with multiple values. This way the data somehow is tried to be standardized (lies between -1 to 1 in mtcars case). Let us now use the built-in heatmap() function to generate a heatmap based on the data we have. 


#Creating a heatmap with scaling around the rows

heatmap(my_data, scale = "row")

 

Here, the function takes a matrix (usually a data frame) as a primary input value and scale = is an argument that specifies whether the values should be centred around the rows or the columns. It can take any of the three arguments (“row”, “column”, “none”)


The image is highlighting the heatmap for all the variables from mtcars data. 

The heatmap for all the variables from mtcars data


Here, the plot is showing a chart that kind of explains the strength of a bond each variable holds for each category of car. The dark colour (maroon or brown maybe) concentration leads towards the strong bond, and the faint colour (yellow and its sheds) represents the weak bond. 


 

What is Mosaic Plot?

 

Whenever you have two or more categorical variables and you wish to check the impact of one over the other for each category, a mosaic plot is the best option for you to go with. This graph consists of rectangles in which the vertical length of each of them represents the proportion of variables on the Y-scale of the axis and the horizontal length of each rectangle represents the proportion of variables on the X-scale of the axis.

 

We have a dataset named HairEyeColor, which fits perfectly with this plot as we have multiple categories of hair and eye colours in this dataset.


#Creating a mosaic plot for the hair and eye colours

data("HairEyeColor")

mosaicplot(HairEyeColor, shade = TRUE)

The data() function here loads the HairEyeColor dataset and then the built-in mosaicplot() function uses the same dataset to generate a mosaic plot with default shades (remember, shade TRUE is specified).


The image is reflecting the Mosaic plot with different categories of hairs and eyes.

Mosaic Plot for different combinations of hair and eye colours


Here, we can see what are the likely patterns of hair and eye colours within males and females. The blue colour represents the highest frequency, meaning most of the blond male or female have brown or blue eyes (which looks true, isn’t it?)


 

What is Map Visualization?

 

Creating a map visualization is again a simple but interesting graphical method. We will use the combination of leaflet and Magritte package functions to generate a beautiful map of the Marine Drive, Mumbai. 

 

The leaflet package has functions that can pull the data associated with streets based on latitude and longitude that we feed as an input. Does it look the same as the google map, fascinating enough? Let’s try a one.


library(magrittr)

library(leaflet)

#Creating a map visualization using leaflet

a <- leaflet() %>%

      addTiles()%>%  #Adds titles of OpenStreetMap to your graph

        addMarkers (lng = 72.823679, lat = 18.941482,

                    popup = "Marine Drive, Mumabi") #Adds lat and long details with popup location point

​​​​​​​a    #Prints the map

Here, the pipeline operator is used to assign complex arguments as nested ones among each other towards the object names “a”. The addTitles() function adds the street details from OpenStreetMap to the object using leaflet(). The addMarkers() will add the markers (latitude and longitude details with a pop-up on the exact locations). See the output as shown below:


This is the map visualization for the marine drive, Mumbai location.

Map visualization for Marine Drive, Mumbai using leaflet


What is the Correlogram Visualization?

 

The correlogram visualization helps us in plotting the correlation matrix. Ideally, a correlogram itself is a visual way of representing the serial correlation present between multiple variables of data that changes over time. It is also known as an Auto Correlation Function (ACF) Plot.

 

We will use the mtcars dataset which comes and is available with the base R. See the code below for a basic correlogram.


#Setting up the library

library(corrgram)



#looking at the mtcars dataset

head(mtcars)



#creating correlation matrix for mtcars

head(cor(mtcars))



#creating a correlogram visualization

corrgram(mtcars)

Here, we first are looking at the first few rows of the mtcars dataset, and then the cor() function helps us to create a correlation matrix, then finally the corrgram() function from a package with the same name allows us to create a correlogram as shown below:
 


The image is depicting the basic correlogram visualization. 

Basic correlogram visualization


Here, the colour density specifies the strength of correlation (strong or weak), as well as the lines in each box, specify whether the correlation is in a negative direction or a positive direction. Now, let us put all this learning through a quick summary.


 

Summary

 

  • The advanced visualizations, as the word suggests have some advanced features and graphs covered than the basic visualizations.

  • The heatmap uses the colour sheds instead of actual numbers to represent the severity as well as information about the data and its variables.

  • The mosaic plot suits best in situations where we have more than two categorical variables and the impact of each category over others needs to be measured.

  • The map visualizations can be created using the well-known leaflet package which has multiple customizations available for the visualization to look perfect.

  • The correlogram visualization allows you to see the Auto Correlation Function Plot which is formed with the help of a correlation matrix where each numeric variables relationship with the other is measured.


This article ends here. In our next article in this trail, we will come up with a new article on ggplot, a package that consists of more advanced features associated with the visualizations. Moreover, we are not only limited to R Programming, and have a range of topics covered such as SQL, AI, Natural Language Processing, Machine Learning, etc. You can visit us at Analytics Steps for more.

0%

Comments