• Category
  • >R Programming

Customizing Your Visuals with ggplot2 in R programming: Part 1

  • Lalit Salunkhe
  • Nov 28, 2020
Customizing Your Visuals with ggplot2 in R programming: Part 1 title banner

In our previous two articles, we have tried to give you the essence of basic and advanced plotting in R programming. However, not more was discussed regarding how to customize those visuals. 

 

Through this article, we will introduce a new level in graphics where we will make sure you can customize the visuals you make as per your necessity. We will use the ggplot2 package which does this task precisely. 


 

What is ggplot2?

 

The power of R lies in its interactive packages that make most of those tasks easy which looks difficult at the start. Out of all such packages, ggplot2 is one developed especially for the users who are dealing with visualizations. This package is usually a part of tidyverse however can also be installed independently. 

 

It was introduced in the world of R Programming in 2005 and is developed by none other than one of the great “Hadley Wickham”. Let’s see how this package helps us in customizing the visuals in R Programming. But before that, if you still haven’t read my previous article on data visualization, you can check it out here: Articles on R Programming.


 

What could be Prerequisites?

 

You can install the ggplot2 package by installing the ‘tidyverse’ package in R using install. packages('tidyverse')line. This line will allow you to install the package and ggplot2 is a part of a series of other packages we get under the tidyverse.

 

We will be using the mpg dataset which comes with the ggplot2 package and represents the data associated with the fuel economy of 38 cars provided by the US Environmental Protection Agency from 1999 to 2008.

 

Let us access this data inside the R environment as below:


library(ggplot2)

#Accessing the "mpg" dataset

data <- ggplot2::mpg

head(data)

This will access the dataset from the ggplot2 package and show the first six rows and headers from the data.


This image shows first few rows of the mpg dataset.

First six rows with a header of the mpg dataset


Here, let us transform, the year, cyl, drv, f1, and class variables into factors for easy proceedings.


#Transforming the data

data <- transform(data,

                  year = factor(year),

                  cyl = factor(cyl),

                  drv = factor(drv),

                  fl = factor(fl),

                  class = factor(class)

                  )

Now, let us have a look at the grammar of ggplot2.

 

The package ggplot2 is developed taking into consideration “The Grammer of Graphics” due to which it has got its name. The idea behind this grammar of graphics is that each graph should be built as a combination of different layers. Therefore, you’ll see the code in layers here while working with ggplot2.

 

  1. The first layer of code takes the dataset name. From where we are taking data to generate the visuals. This can be done using the ggplot() function.

  2. The second layer consists of the aesthetics elements such as variables to choose for both of the axes, fill, colour, shape, etc. We have the aes() function that does this task.

  3. The third layer will represent the visual type and have multiple options such as geom_histogram(), geom_bar(), geom_point(), etc.

  4. Next layers will get added as per the customizations you make in your visuals.

 

Let us try to customize our plots using the ggplot2 package.

 

  • Creating a visual with ggplot2

 

The following code will create a histogram visual for hwy variable from the mpg dataset.


#creating a histogram plot using ggplot2

ggplot(data)+

   aes(x = hwy)+

   geom_histogram()

Let us see the output of this code as shown below:


The image is showing a histogram based on the code provided above.

Histogram for the hwy variable from the mpg dataset


  • Customizing the titles

 

To customize the main title, labels of axes, limit of the axis, we will use the below functions:

 

  • A title can be changed with the ggtitle() function.

  • Labels for both of the axes can be added using the xlab() and ylab() function.

  • The limit of the axes can be set using xlim() and ylim() functions.

 

See the code below:


#Customizing the titles

ggplot(data)+

  aes(x = hwy)+

  geom_histogram()+

  ggtitle("Highway miles per gallon for 38 popular car models")+

  xlab("Highway miles per gallon")+

  ylab("Count of cars")+

  xlim(0, 40)

The output of this code is as shown below:


Customizations made for title, axes labels, and the limit for axes

Customizing the titles, axes labels, and axes limits for histogram


Well, if you see the previous visual, there are bars that are outliers. The first one is at the start itself from the left, and the other two are after the value 40. Since we have limited the x-axis, visual has removed those outliers from both of the sides.

 

  • Customizing theme of the chart

 

There are some built-in themes in ggplot2 that allow us to customize the visuals. These themes will give finishing to your visual. Following are some of the themes we can use from ggplot2.

 

  • theme_bw()

  • theme_dark()

  • theme_minimal()

  • theme_classic()

 

Each one of these themes has its own styling to represent the graphs. We will see an example of these as below:


#Customizing the titles

ggplot(data)+

  aes(x = hwy)+

  geom_histogram()+

  ggtitle("Highway miles per gallon for 38 popular car models")+

  xlab("Highway miles per gallon")+

  ylab("Count of cars")+

  xlim(0, 40)+

  theme_bw()

This code will generate a histogram visual with classic black and white theme. See it as shown below:


Applying theme to the histogram

Histogram with a black and white theme


Out of all the themes ggplot2 provide, theme_classic() is my favourite and will show you how a visual look into that, rest other, I will leave it up to you to explore by your own.


#Customizing the titles

ggplot(data)+

  aes(x = hwy)+

  geom_histogram(fill = "#FF0033", color = "#CC0000")+

  ggtitle("Highway miles per gallon for 38 popular car models")+

  xlab("Highway miles per gallon")+

  ylab("Count of cars")+

  xlim(0, 40)+

  theme_classic()

The best part about ggplot2 is, we can fill the shape and outlines with different colour options that are present in R. All these codes are mapped with hexadecimal coding and writing about that would need a separate article here. Let us see the output of the code above.


Customizing colors and fill of the visual.

Visuals are customized using colour and fill for different themes


  • Combining two graphs together

 

We can also combine two graphs in R with the help of ggplot2. Let us combine the density plot and histogram for hwy variable from mpg data. To create a density plot, the following code can be used.


#Customizing the titles

ggplot(data)+

  aes(x = hwy)+

  geom_density()+

  ggtitle("Highway miles per gallon for 38 popular car models")+

  xlab("Highway miles per gallon")+

  ylab("Count of cars")+

  xlim(0, 40)+

  theme_classic()

The output visual of this code is as shown below. 


Density plot associated with the code above.

Density plot for the hwy variable from the mpg dataset


Now, let us combine this density plot and histogram and plot together for hwy variable from mpg dataset. See the code as shown below:


 

#Combining Histogram and a density plot

ggplot(data)+

  aes(x = hwy, y = ..density..)+

  geom_histogram()+

  geom_density()+

  ggtitle("Highway miles per gallon for 38 popular car models")+

  xlab("Highway miles per gallon")+

  ylab("Count of cars")+

  xlim(0, 40)+

  theme_classic()

This code has a slight change in aesthetics where the x-axis will represent the “hwy” and the on the y-axis the density plot will be placed. The output of this code will be a visual as shown below:


Combination of a histogram and a density plot.

Histogram and density plot combined together for the hwy variable under mpg dataset


Here, the density plot allows us to see the data pattern more clearly than the histogram. Let us wrap this article here. In the next article, we will come up with some other customization techniques associated with the ggplot2 and visuals generated by it.


 

Summary

 

  • The ggplot2 package is developed considering the grammar of graphics to serialize the graphs/visuals. The concept is, based on the layering structure. Where the first layer includes the dataset name, second with the aesthetics such as axes info, and the third layer consists of the name of the graph or the visual.

  • We can customize the titles using ggtitle(), xlab(), ylab() functions. We can additionally specify the axis limits using xlim() and ylim() functions.

  • There are several themes which are helpful in customizing the visuals. Some of them are theme_bw(), theme_classic(), however, you can always search google for various options available to customize the visual themes.

  • We can combine two graphs together to make a combined visual which helps us see the clearer picture of the data we have. For example, we can combine histogram and the density plot together.

 

Let us stop here for the sake of this article, but with a promise that we will bring some advanced customizations available under ggplot2 in our next article, using which you can stand out with the visuals you make at your workspace. Until then, keep learning!

Latest Comments