Introduction to Decision Tree Algorithm in Machine Learning

  • Rohit Dwivedi
  • May 10, 2020
  • Machine Learning
Introduction to Decision Tree Algorithm in Machine Learning title banner

“The possible solutions to a given problem emerge as the leaves of a tree, each node representing a point of deliberation and decision.” - Niklaus Wirth (1934 — ), Programming language designer


In Machine learning, ensemble methods like decision tree, random forest are widely used. So in this blog I will explain the Decision tree algorithm. How is it used? How it functions will be covering everything that is related to the decision tree.



What is a Decision Tree


Decision tree as the name suggests it is a flow like tree structure that works on the principle of conditions. It is efficient and has strong algorithms used for predictive analysis. It has mainly attributes that include internal nodes, branches and a terminal node.


Every internal node holds a “test” on an attribute, branches hold the conclusion of the test and every leaf node means the class label. This is the most used algorithm when it comes to supervised models. It is used for both classification as well as regression. It is often termed as “CART” that means classification and regression tree. Tree algorithms are always preferred due to stability and reliability. 



How can an algorithm be used to represent a tree?


Let us see an example of a basic decision tree where it is to be decided in what conditions to play cricket and in what conditions not to play.

Decision Tree of playing Cricket.


Decision Tree

You might have got a fair idea about the conditions on which decision trees work with the above example. Let us now see the common terms used in Decision Tree that are stated below:


  • Branches - Division of the whole tree is called branches.

  • Root Node - Represent the whole sample that is further divided.

  • Splitting - Division of nodes is called splitting.

  • Terminal Node - Node that does not split further is called a terminal node.

  • Decision Node -  It is a node that also gets further divided into different sub nodes being a sub node. 

  • Pruning - Removal of subnodes from a decision node.

  • Parent and Child Node - When a node gets divided further then that node is termed as parent node whereas the divided nodes or the sub nodes are termed as child node of the parent node.



What Is The Working Principle Of Decision Tree?


Decision trees are considered to be widely used in data science. It is a key proven tool for taking decisions in complex scenarios. It can also be used as a binary classification problem like to predict whether a bank customer will churn or not, whether an individual who has requested a loan from the bank will default or not and can even work for multiclass classifications problems. But how does it do these tasks?


Decision trees create a tree-like structure by computing the relationship between independent features and a target. This is done by making use of functions that are based on comparison operators on the independent features. 


It works on both the type of input & output that is categorical and continuous. It uses different algorithms to check about the split and variable that allow best homogeneous sets of population.


Types of Decision Tree


Type of decision tree depends upon the type of input we have that is categorical or numerical : 


  1. If the input is a categorical variable like whether the loan contender will deafual or not, that is either yes / no. This type of decision tree is called a Categorical variable decision tree. 

  2. If the input is numeric types and or is continuous in nature like when we have to predict a house price. Then the used decision tree is called a Continuous variable decision tree.


How decision tree can be used?

Decision Tree Machine Learning Algorithm



  • ID3 (Iterative Dicotomizer3) – This DT algorithm was developed by Ross Quinlan that uses greedy algorithms to generate multiple branch trees. Trees extend to maximum size before puning.

  • C4.5 flourished ID3 by overcoming restrictions of features that are required to be categorical. It effectively defines distinct attributes for numerical features. Using if-then condition it converts the trained trees. 

  • C5.0 uses less space and creates smaller rulesets than C4.5.

  • The CART classification and regression tree is similar to C4.5 but it braces numerical target variables and does not calculate the rule sets. It generates the binary tree. 



How to prevent overfitting through regularization?


There is no belief that is assumed by DT that is association between the independent and dependent variables. DT is a distribution free algorithm. If DT are left unrestricted they can generate tree structures that are adapted to the training data which will result in overfitting. 


To avoid these things, we need to restrict it during the generation of trees that is called Regularization. The parameters of regularization are dependent on the DT algorithm used.


Some of the regularization parameters.


  1. Max_depth: It is the maximal length of a path that is from root to leaf. Leaf nodes are not splitted further because they can create a tree with leaf nodes that takes many inspections on one side of the tree whereas nodes that contain very less inspection get again split.

  2. Min_sample_spilt: It is the limit that is imposed to stop further splitting of nodes.

  3. Min_sample_leaf: A min number of samples that a leaf node has. If leaf nodes have only few findings it can then result in overfitting.

  4. Max_leaf_node: It is defined as the max no of leaf nodes in a tree.

  5. Max_feature_size:  It is computed as the max no of features that are examined for the splitting for each node.

  6. Min_weight_fraction_leaf: It is similar to min_sample_leaf that is calculated in the fraction of total no weighted instances.


You can refer here to check about the usage of different parameters used in decision tree classifiers.



What are Advantages and Disadvantages of Decision Trees?




  • DT is effective and is very simple.

  • DT can be used while dealing with the missing values in the dataset.

  • DT can take care of numeric as well as categorical features.

  • Results that are generated from DT does not require any statistical or mathematics knowledge to be explained.




  • Logics get transformed if there are even small changes in training data.

  • Larger trees get difficult to interpret.

  • Biased towards three having more levels.


To see the documentation of the decision tree using the sklearn library you can refer here.





In Machine learning and Data science you cannot always rely on linear models because there is non-linearity at maximum places. It is noted that tree models like Random forest, Decision trees deal in a good way with non-linearity. Decision tree algorithms come from supervised learning models that can be used for both classification and regression tasks. The task that is challenging in decision trees is to check about the factors that decide the root node and each level, although the results in DT are  very easy to interpret.  


In this blog, I have covered what is the decision tree, what is the principle behind DT, different types of decision trees, different algorithms that are used in DT, prevention of overfitting of the model and regularization.


Rohit Dwivedi

Data Science enthusiast who is currently pursuing a Post Graduate Program in Machine learning and Artificial Intelligence from Great Leaning. He has experience in Data Analytics, Machine Learning, Neural Networks, Computer Vision, and Natural Language Processing. He has done various good projects in the domain of analytics. His goal is to build various use cases using the power of Artificial Intelligence and Machine Learning and solving business problems.

Trending blogs

  • What is the OpenAI GPT-3?

  • Introduction to Time Series Analysis: Time-Series Forecasting Machine learning Methods & Models

  • How is Artificial Intelligence (AI) Making TikTok Tick?

  • 6 Major Branches of Artificial Intelligence (AI)

  • 7 Types of Activation Functions in Neural Network

  • 7 types of regression techniques you should know in Machine Learning

  • Reliance Jio and JioMart: Marketing Strategy, SWOT Analysis, and Working Ecosystem

  • Top 10 Big Data Technologies in 2020

  • Introduction to Logistic Regression - Sigmoid Function, Code Explanation

  • What is K-means Clustering in Machine Learning?

Write a BLOG