adplus-dvertising

Lattice Package: Visualizing Multivariate Data in R - Continued

  • Lalit Salunkhe
  • Mar 07, 2021
  • R Programming
Lattice Package: Visualizing Multivariate Data in R - Continued title banner

In the previous article, we have seen how to generate some useful multivariate visuals in R programming using functions from the well-known and all-loved lattice package. 

 

In this article though, we will be extending our run towards the same and get back to you with some other type of visuals from the same package with hands-on examples for your reference. 

 

Also, if you are new to our website and have not seen previous articles from the same trail, you can read those out here: R Programming Articles List

 

Moreover, I have posted a previous article on Lattice Package and visualizations which can be read through Lattice Package: Visualizing Multivariate Data in R where I tried to put some light on the package itself and some beautiful visualizations that can be made through functions provided in it.

 

Well, for those who are new to the Lattice and didn’t get the chance to read my previous article by chance, the lattice is a package in R programming that provides the best-in-business visuals for your data with real strength of it lies in either dividing the main chart into various panes based on the grouping variable. 

 

Now, when it comes to the data part, we will be using the different built-in datasets from R itself to get the best visuals out for you. 


 

The Violin Plots

 

In the last article, we have ended it with the box plots where I wrote code for mtcars data in a way that it shows a box plot relationship for the factor_gear and mpg over factor_cyl variable.

 

Well factor_gear and factor_cyl are two variables newly created from the original gear and cyl variable respectively, present under mtcars data (we factorized those original variables based on the number of gears and number cylinders respectively.)

 

When we create a box and whisker plot, it comes with its complexities sometimes in reading and understanding the distribution. In such cases, violin charts can be very useful and use the same bwplot() function with additional panel =  panel.violin argument to convert traditional box plots into a violin plot. See the code below for a better understanding of the discussion above.


library(lattice)

attach(mtcars)

attach(iris)

#Creating factors for gears and cyl variables

#Will need those latter

factor_gear <- factor(gear, levels = c(3, 4, 5),

                      labels = c("3 gear", "4 gear", "5 gear"))



factor_cyl <- factor(cyl, levels = c(4, 6, 8),

                     labels = c("4 cyl","6 cyl", "8 cyl"))



#violin plot associated with multiple variables and alternate layout

bwplot(factor_gear ~ mpg | factor_cyl,

       data = mtcars,

       xlab = "Miles per Gallon (US)",

       ylab = "No of Gears",

       Main = "Mileage by no. of gears and cylinders",

       panel = panel.violin)

Here, if you see, we first loaded the library, then attached the required datasets and transformed two new factorial variables (factor_gear and factor_cyl respectively). This first piece of code, we will not be including in our next examples as it is a code to run one time. 

 

Below that, we used the bwplot() function which by default plots the box plot for mileage by the number of gears for every available number of cylinder values. 

 

However, we just use the panel = panel.violin argument which converts the standard box plot into a violin chart.

 

See the output visual as shown below:


The violin chart in R programming with help of bwplot() function and additional panel argument.

The Violin plot


Here, you could see how the box and whisker plots are turned into the violin chart based on one single command, and for naked eyes, these violin plots seem more readable/understandable in comparison with those to business users.

 

(Also check: 4 Types of Data Visualization Using R Programming)

 

 

The Dot Plot

 

The dot plots are convenient to use when we have compact tabular data (data of the same type usually) to work with and we can also use a grouping variable to have a visual for summarized effect. Under lattice, we have the dotplot() function which does the work for us.

 

We will use the VADeaths data here in order to produce the dot plots. You can load the data by simply typing VADeaths in the R Console. The result will be as given below:


This image shows the structure of VADeaths data, a built-in dataset in R.

Loading a built-in VADeaths data from R


This data is considered as a tabled array and not the data frame all the data points are of the same type. You can see the class of the dataset as shown below:


This image shows the datatype of the VADeaths dataset.

VADeaths is a matrix array considered as a table of one data type


Well, talking about the data, it is a data of deaths from a state Virginia from the US with the cross-tabular format by age and population group in 1940.

 

Now, we can use the dotplot() function to visualize this data and see the insights.


#Creating a dotplot visualization for a tabular data structure

dotplot(VADeaths, groups = FALSE)

This is a general code where grouping is not allowed and hence the dot plot will be made for each column present in the matrix against the age groups see the image below:


dot plot for all separate categories of the VADeaths table.

Basic dot plot associated with VADeaths Data


Here, we can see that the death rates between males and females are not easily comparable as they are in different columns. Well, this is well within our reach as we can make a layout where all four graphs are displayed in a single column. 

 

Also, it would be interesting if we add the thin lines so that the relative magnitudes of the death rates can be compared easily from its origin (which is zero obviously. Basically, I am talking of making it a stick chart). Let’s see how we can achieve this with the following piece of code.


#Customizations in dot plot visualization

dotplot(VADeaths, groups = FALSE,

        layout = c(1, 4),

        origin = 0,

        type = c("p", "h"), # p = plot points, 

h = drop lines to origin (like histogram)

        main = "Virginia Death Rates of year 1940",

        xlab = "Death rate per 1000 people")

Well, the code is straightforward, the layout allows us to get the one columned output. Origin = argument sets the origin to zero and type = argument allows us to have a histogram-like structure with thin lines. The output looks as below:


Customizing the dot plot visual by adding layout options and plotting sticks and adding title as well as axis labels.

Customizing the dotplot visual in terms of layout and appearance


This was one way to make comparisons between the death rates of males and females from Virginia. However, it would have been nice if we could make a direct comparison between these columns all grouped together on a single pane. We can do that by omitting the group = FALSE argument. See the code given below:


#Grouped comparison within dotplot

dotplot(VADeaths,

        type = "o",   # o =  Points and lines overlaid

        auto.key = list(lines = TRUE, space = "left"),

        main = "Virginia Death Rates of the year 1940 - All Together",

        xlab = "Death rate per 1000 people")

The type = “o” argument allows us to generate a line connecting all frequency dots of the same category and auto.key argument allows us to specify the legends for this visual. See the output as shown below:


grouped comparison for the death rates for males and females along with different categories for the state of Virginia.

Grouped comparison for all the categories of VADeaths data


(Recommended blog: Packages in R Programming)


 

The Bar Chart

 

Well, bar charts, when we talk about them are one of the visualization methods that are popular among analysts when it comes to representing the graphical data visually. 

 

However, this method is not as effective as dot plots due to various customizations available under the prior and lack of the same in later. 

 

The barchart() function from lattice however does the task effectively even with all the limitations the visual itself has.

 

Let’s write a code that can create a bar chart for the same VADeaths data as shown below:


#Creating a barchart visual

barchart(VADeaths,

         group = FALSE,

         main = "Death Rates in Virginia - Year 1940",

         xlab = "Death Rate per 1000 people")

The output of the code will be as shown below:


The barchart visual in the lattice with labels and main title.

Creating a barchart visual for the VADeaths data


Well, all we can do with customizing this one is in terms of layout and aspect ratio (this will allow you to shrink the graph area).


 

#Creating a barchart visual with custome layout and aspect ratio

barchart(VADeaths,

         group = FALSE,

         layout = c(1, 4),

         aspect = 0.5,

         main = "Death Rates in Virginia - Year 1940",

         xlab = "Death Rate per 1000 people")

See the output of this one as shown below:


This image shows how the visual can be customized in terms of layout and the aspect ratio to shrink the actual size

More Customized Bar Chart with layout and aspect ration altered


Well, we will end this article here until I come up with a new and interesting one from the world of R Programming for you all. Until then, keep safe! Keep learning!

0%

Comments