• Category
  • >Machine Learning

What is Hierarchical Clustering in Machine Learning?

  • Utsav Mishra
  • Jun 15, 2021
What is Hierarchical Clustering in Machine Learning? title banner



There are several Machine Learning algorithms, one such important algorithm of machine learning is Clustering.


Clustering is an unsupervised learning method in machine learning. It means that it is a machine learning algorithm that can draw inferences from a given dataset on its own, without any kind of human intervention. 


If we had to define clustering in simple words, we would simply say that Clustering is the process of partitioning a population or set of data points into several groups so that data points in the same group are more similar to each other and different from data points in other groups. It is essentially a grouping of items based on their similarity and dissimilarity.


For example- consider the data points in the graph below, the data points are clustered in single groups in the second image. Through those clustered data points we can easily distinguish between the datasets and can identify that there are three datasets present there.



Types of clustering method


There are five types of clustering methods in machine learning, these are as follows:


  • Partitioning Clustering

  • Density-Based Clustering

  • Distribution Model-Based Clustering

  • Hierarchical Clustering

  • Fuzzy Clustering


In this blog we are going to talk about Hierarchical Clustering. So, first let us try to know what hierarchical clustering is.


About Hierarchical Clustering


Hierarchical clustering, also known as hierarchical cluster analysis or HCA, is another unsupervised machine learning approach for grouping unlabeled datasets into clusters.


The hierarchy of clusters is developed in the form of a tree in this technique, and this tree-shaped structure is known as the dendrogram.


Simply speaking, Separating data into groups based on some measure of similarity, finding a technique to quantify how they're alike and different, and limiting down the data is what hierarchical clustering is all about.


Hierarchical clustering method functions in two approaches-

  • Agglomerative

  • Divisive


(Related blog: What is K-means clustering?)


Approaches of Hierarchical Clustering


  1. Agglomerative clustering: 


Agglomerative Clustering is a bottom-up strategy in which each data point is originally a cluster of its own, and as one travels up the hierarchy, more pairs of clusters are combined. In it, two nearest clusters are taken and joined to form one single cluster.


  1. Divisive clustering: 


The divisive clustering algorithm is a top-down clustering strategy in which all points in the dataset are initially assigned to one cluster and then divided iteratively as one progresses down the hierarchy. 


It partitions data points that are clustered together into one cluster based on the slightest difference. This process continues till the desired number of clusters is obtained.


How does it work?


Each observation is treated as a separate cluster in hierarchical clustering. After that, it repeats the next two steps: 


  1.  Finds the two clusters that are the closest together

  2.  Combines the two clusters that are the most similar. This iterative process is repeated until all of the clusters have been integrated.

Working of Hierarchical Clustering, image source

In it, there is one cluster that after combining with data points closest to it, starts getting bigger. The same cluster gets bigger as long as all the data points are inside a single cluster. After this, the cluster is divided with the development of the dendrogram as per our desired problem.


Now we must be thinking about what this dendrogram is?


A dendrogram is a visual representation of the hierarchical connection between items. It's most often produced as a result of hierarchical clustering. A dendrogram's main purpose is to figure out the best approach to assign items to clusters according to similarities and dissimilarities and our desired problem.


Now you must be thinking about how these similarities and dissimilarities are obtained to form or divide a cluster.


(Related blog: Top Machine Learning Models)



Measuring similarities and dissimilarities in clustering


The most common methods to measure similarities and dissimilarities in clustering are mentioned below:


Maximum or complete linkage clustering: It computes all pairwise dissimilarities between the items in cluster 1 and the elements in cluster 2, and uses the greatest value (i.e., maximum value) of these dissimilarities as the distance between the two clusters. It produces more compact clusters.


Minimum or single linkage clustering: The least pairwise dissimilarities between the items in cluster 1 and the elements in cluster 2 are computed and used as a linkage criterion in minimum or single linkage clustering. Long, "loose" clusters are the result.


Mean or average linkage clustering: It computes all pairwise dissimilarities between the items in cluster 1 and the elements in cluster 2 and uses the average of these dissimilarities to determine the distance between the two clusters.


Centroid linkage clustering: The dissimilarity between the centroid for cluster 1 and the centroid for cluster 2 is computed using centroid linkage clustering.


Ward’s minimum variance method: Ward's minimum variance method reduces the overall within-cluster variation to the lowest level possible. At each phase, the clusters with the shortest distance between them are merged.


Now, as we know this much about hierarchical clustering, let us try and find out the advantages it provides us with so that it is that widely used. These methods are also called distance measurement methods.


(Similar blog: Machine Learning Tools)


Advantages of Hierarchical Clustering


The most common advantages of hierarchical clustering are listed below-


  1. Easy to understand: hierarchical clustering doesn't use any complex methods that are too hard to understand, instead it uses simple methods that can be easily understood by anyone regardless of their familiarity with the topic.


  1. A straightforward approach: Hierarchical clustering uses a straightforward approach as compared to other algorithms of machine learning. It directly takes you to the program screen and makes you mix and solve different kinds of problems with the help of it. 


  1. An appealing output: The main output as delivered by the hierarchical clustering algorithm is the dendrogram. The dendrogram is something that appeals to the users and is easy to figure out. The interesting-looking dendrogram proves as an appealing output to the users.

                                      A dendrogram, image source

  1. Clarity of the bigger picture: the dendrogram offers clarity of the bigger picture. With the help of hierarchical clustering, we get to know what lies in the near future, all thanks to the dendrogram. 


This hierarchical clustering algorithm is used in many fields. The list of applications is longer than the list of advantages. 


(Also read: Deep Learning Algorithms)



Applications of Hierarchical Clustering


The Top-5 applications of hierarchical clustering are:


  1. Identifying fake news:


Fake news is not a new phenomenon, but it is growing more prevalent. Thanks to technological advancements like social media, fake news is being manufactured and circulated at an alarming rate.


Here to tackle this problem, technology or specifically speaking hierarchical clustering is used. Detection of Fake and False News


The method works by analyzing the words used in the false news stories, the corpus, and then grouping them. These clusters aid the algorithm in determining which news items are authentic and which are fraudulent. 


In sensationalized, click-bait stories, certain terms appear more frequently. When an article has a high percentage of certain phrases, it is more likely that the content is false news.


  1. Identifying criminal activity:


Criminal activities when dealt with with technologies provide an effective solution for them. Sometimes, a certain area of province or district seems to be more affected by criminal activities than the other ones.


Here, we can use hierarchical clustering to identify those criminal activities. The system can group similar activities by analyzing GPS data. You may then categorize the groups based on their qualities into those that are genuine and those that are fake.


  1. Document Analysis:


Document analysis has been one of the most important needs of the generation. Every person has a different reason why they want to run an analysis on their document. 


In this case, hierarchical clustering has been a useful algorithm. The system can analyze the text and categorize it into many topics. Using the features described in the text, you may easily cluster and arrange related papers using this methodology.


(Related blog:  What is Text mining?)


  1. Phylogenetic trees analysis


There has always been a question burning around us. A question we regularly see in the science section of our newspapers. Have you come across these questions too, “bear closer to raccoons or pandas in genetic terms?”


Scientists have been trying to find answers to this question since forever. But now with hierarchical clustering, the answers seem in front of our eyes. We can now determine the evolutionary tree of animal evolution using DNA sequencing and hierarchical clustering. 


The following steps are followed in the process


  • Produce the DNA sequences.

  • Calculate the distance between all sequences that need to be edited.

  • Based on the edit distances, calculate the DNA similarities.

  • Assemble a phylogenetic tree.

An example, image source

  1. Tracking viruses through phylogenetic tree


It's a huge health problem to track viral epidemics and their sources. Tracing the root of these epidemics can provide scientists with more information about why and how the outbreak started, perhaps saving lives.


Viruses like HIV have rapid mutation rates, which implies that the similarity of a virus's DNA sequence varies depending on how long ago it was transmitted. This may be used to track transmission pathways.


Hierarchical clustering seems like a useful method to track the path of it.


(Must read: Expectation-Maximization in Machine Learning)





In this blog, we came across the concept of hierarchical clustering. Hierarchical clustering is a useful approach for creating tree structures out of data similarities. 


(Must read: Machine Learning Applications)


We can now see how distinct sub-clusters are related to one another, as well as the distance between data points. Looking at the uses and applications, one sure thing was there to be understood that a bright future lies ahead for the algorithm. And not just for the algorithm but for all of the machine learning the future can bring a massive turnaround.

Latest Comments