Machine Learning has been one of the most rapidly advancing topics to study in the field of Artificial Intelligence. There are a lot of algorithms under Machine Learning that have specifically gained popularity due to their transparent nature. One of them is the Decision Tree algorithm, popularly known as the Classification and Regression Trees (CART) algorithm.
The CART algorithm is a type of classification algorithm that is required to build a decision tree on the basis of Gini’s impurity index. It is a basic machine learning algorithm and provides a wide variety of use cases. A statistician named Leo Breiman coined the phrase to describe Decision Tree algorithms that may be used for classification or regression predictive modeling issues.
CART is an umbrella word that refers to the following types of decision trees:
Classification Trees: When the target variable is continuous, the tree is used to find the "class" into which the target variable is most likely to fall.
Regression trees: These are used to forecast the value of a continuous variable.
In this article, we will discuss Decision Trees, the CART algorithm and its different models, and the advantages of the CART algorithm.
Understanding Decision Tree
A decision Tree is a technique used for predictive analysis in the fields of statistics, data mining, and machine learning. The predictive model here is the decision tree and it is employed to progress from observations about an item that is represented by branches and finally concludes at the item’s target value, which is represented in the leaves. Because of their readability and simplicity, decision trees are among the most popular machine learning methods.
The structure of a decision tree consists of three main parts: Root nodes, Internal Nodes and Leaf Nodes.
Decision Tree Structure, Image source
As shown in the diagram, the first node or the Root node is the training data set, followed by the internal node and leaf node. The internal node acts as the decision-making node, as this is the point at which the node divides further based on the best feature of the sub-group. The final node or the leaf node is the one that holds the decision.
(Must read: Cross-validation in Machine Learning)
In the decision tree, the nodes are split into subnodes on the basis of a threshold value of an attribute. The CART algorithm does that by searching for the best homogeneity for the subnodes, with the help of the Gini Index criterion.
The root node is taken as the training set and is split into two by considering the best attribute and threshold value. Further, the subsets are also split using the same logic. This continues till the last pure sub-set is found in the tree or the maximum number of leaves possible in that growing tree. This is also known as Tree Pruning.
Calculating Gini Index:
The formula of Gini Index
Here, c is the total number of classes and P is the probability of class i.
(Related blog: AUC-ROC Curve Tutorial)
CART models from Data:
CART models are formed by picking input variables and evaluating split points on those variables until an appropriate tree is produced, according to Machine Learning Mastery.
Let us look at the steps required to create a Decision Tree using the CART algorithm:
The input variables and the split points are selected through a greedy algorithm. Constructing a binary decision tree is a technique of splitting up the input space. A predetermined ending condition, such as a minimum number of training examples given to each leaf node of the tree, is used to halt tree building.
The input space is divided using the Greedy approach. This is known as recursive binary splitting. This is a numerical method in which all of the values are aligned and several split points are tried and assessed using a cost function, with the split with the lowest cost being chosen.
The cost function that is reduced to determine split points for regression predictive modeling problems is the sum squared error across all training samples that lie inside the rectangle:
sum(y – p)^2
Here, y is the output of the training sample, and p is the estimated output for the rectangle.
The Gini index function is used for classification, and it indicates how "pure" the leaf nodes are. The formula for this is:
G = sum(pk * (1 – pk))
Here, G is the Gini index, pk is the proportion of training instances with class k in the rectangle.
As it works its way down the tree with the training data, the recursive binary splitting method described above must know when to stop splitting.
The most frequent halting method is to utilize a minimum amount of training data allocated to each leaf node. If the count is less than a certain threshold, the split is rejected and the node is considered the last leaf node.
The number of training members is adjusted according to the dataset. It specifies how exact the tree will be to the training data.
(Suggested blog: Binary and multiclass classification in ML)
A decision tree's complexity is defined as the number of splits in the tree. Trees with fewer branches are recommended. They are simple to grasp and less prone to cluster the data.
Working through each leaf node in the tree and evaluating the effect of deleting it using a hold-out test set is the quickest and simplest pruning approach. Only leaf nodes are eliminated if the total cost function for the complete test set decreases. When no additional improvements can be achieved, then no more nodes should be removed.
More advanced pruning approaches, such as cost complexity pruning (also known as weakest link pruning), can be applied, in which a learning parameter (alpha) is used to determine whether nodes can be eliminated depending on the size of the sub-tree.
Data preparation for CART algorithm:
No special data preparation is required for the CART algorithm.
(Also read: Machine Learning Models)
Advantages of CART algorithm
The CART algorithm is nonparametric, thus it does not depend on information from a certain sort of distribution.
The CART algorithm combines both testings with a test data set and cross-validation to more precisely measure the goodness of fit.
CART allows one to utilize the same variables many times in various regions of the tree. This skill is capable of revealing intricate interdependencies between groups of variables.
Outliers in the input variables have no meaningful effect on CART.
One can loosen halting restrictions to allow decision trees to overgrow and then trim the tree down to its ideal size. This method reduces the likelihood of missing essential structure in the data set by terminating too soon.
To choose the input set of variables, CART can be used in combination with other prediction algorithms.
(Similar reading: Clustering Algorithm)
The CART algorithm is a subpart of Random Forest, which is one of the most powerful algorithms of Machine learning. The CART algorithm is organized as a series of questions, the responses to which decide the following question if any. The ultimate outcome of these questions is a tree-like structure with terminal nodes when there are no more questions.
This algorithm is widely used in making Decision Trees through Classification and Regression. Decision Trees are widely used in data mining to create a model that predicts the value of a target based on the values of many input variables (or independent variables).