While we try to explore data, we often get confused as many datasets are composed of complex hierarchical data, which makes it difficult to visualize the data for further exploration. To solve this data visualization problem, we use a treemap that can properly structure the hierarchical data. Many organizations have shifted to treemap for the visualizing of their large data.
Treemaps are used mainly for categorical data which means for the data that can be further divided, this also makes sense as the representation or the visualization of the treemap is like a tree-based structure. Generally, different colors are used to represent different sets of data which makes it easily understandable.
The treemaps were first introduced by Ben Shneiderman who was trying to wrap up large data files of hierarchical nature without using a lot of memory/storage, therefore he created this memory-efficient method to represent his huge data and found it to be very useful.
Use Cases of Treemap
The first thing in order to learn the treemap is to learn in which cases we shall use it. Listing a few of the use cases of a treemap below-:
Treemaps are used when the data you want to represent is categorical to demonstrate the relationship among the data itself.
Treemaps are used to display the magnitude or importance of categories so that one can understand the enormity of that particular category
When the data is large, the clean representation of such data becomes nasty and that’s where a treemap comes to the picture in order to provide clean and informative insights from the data
Use cases of a treemap
Treemaps can give the user the necessary depth because of being able to represent categorical data, which means there could be several layers that will help in the understanding of data.
However, there are a few limitations that one should keep in mind whilst using a treemap, i.e, this type of representation doesn’t hold good when you want to compare the categories within the data due to the fact that treemaps focus on the area and color to determine the importance of the variable, therefore, if the precise comparison is the need, then treemap is not a solution.
(Must read: Data Visualization techniques)
Another downside of such a type of representation is the visualization that could become a bit complex, as this representation consists of several rectangles representing the variables in a dataset, it becomes really hard for labels to even coincide with the smaller rectangles.
However, these problems with the treemaps are solved by using a cushion treemap as it gives the 3d texture to the rectangles. Below is the treemap for SAT score used for college admissions-:
Representation of a treemap
This treemap is made with the help of tableau, to implement a treemap using tableau is one of the simplest ways to represent it. By now you must have understood that the main purpose of the treemap is to find out the most important or largest category from the dataset. That's why many organizations use a treemap to represent their sales data.
While we can choose many colors for the representation of categories in a treemap, one must be wise enough to use a color palette that could also be easily readable by color blind people. A treemap must show information about the category while hovering around it, for example -:
Hovering over a rectangle of a treemap to get information
(Must read: Power BI and Tableau for data visualization)
Java Treemap VS Hashmap
Java treemap is a treemap that is implemented using java programming languages to develop mapping interfaces. Key-value pairs are sorted and stored with the help of a treemap. Below is a structure of a java treemap.
There are various things that treemap could implement such as Navigable Map, serializable, and cloneable while hashmaps are also able to implement serializable, cloneable, and map interfaces.
Apart from both being similar with the intent, but implementation is different, hashmaps use heterogenous key values whereas java treemap uses homogeneous key values as the latter implements sorting of the keys.
Hashmaps are faster than a treemap in a fair competition that is also shown with the help of the time complexity, the time complexity for the hashmaps is o(1) while the time complexity for the treemap is o(log(n)).
(Also read: Lattice Package in R)
Implementing A Treemap Using Squarify
As we have understood that a treemap has many use cases, that depend upon the user to user, however, the primary motive is representation or data visualization. We are going to use a python library called ‘Squarify’ to implement a treemap.
First of all, make sure you install all the necessary libraries, install Squarify using a pip command-:
pip install squarify
import matplotlib.pyplot as plt
import numpy as np
from vega_datasets import data as vds
In the first step, we have imported all the necessary libraries like squarify, numpy, matplotlib with the dataset from vega_datasets.
Plotting treemaps using squarify is more or less the same as plotting with the help of matplotlib, let’s see the implementation-:
cars = vds.cars()
Showcasing some of the last rows of the dataset, this is done with the help of the python function called ‘.tail()’
origin_counts = cars.groupby('Origin').size().reset_index(name='counts')
Grouping the data on the basis of origin and counting the number of times these origins occurred.
# plot arguments
sizes = origin_counts.counts.to_list()
color = plt.cm.Dark2(np.random.rand(len(sizes)))
label = list(zip(origin_counts.Origin, origin_counts.counts))
# treemap plot
plt.title('Treemap of Car Origins')
This is a treemap we get after whilst trying to get the car origins from the dataset, as mentioned before, the larger rectangle shows that the value of that category is more than others, here orange color rectangle refers to the 254 cars that origins from the USA, therefore this rectangle is the largest among others
(Must catch: Julia vs Python)
Treemap is definitely one of the most used data visualization techniques used by various organizations to represent their large dataset. We have also seen how we can use the python libraries to implement a treemap for any dataset.
Although, this technique also comes with a slight drawback like data getting too much crisp that it is hard to get information out of it sometimes, and also this representation is not meant for comparison based intent.