Gradient Boosting is a method with which we try to increase the accuracy of our machine learning model, this method allows us to combine all the weak models, and after the combination of various weak models, we get a single model, which improves the accuracy of our model.
Gradient boosting is a concept from ensemble learning, which by name, you can guess that it means assembling several models to form a strong model.
(Recommended blog: What is Ensemble Learning?)
The question is which type of model does gradient boosting combine? This model generally combines several decision trees on sub-parts of the same dataset to form a stronger predictive model.
Understanding Gradient Boosting Method
Gradient boosting integrates multiple machine learning models (mainly decision trees) and every decision tree model gives a prediction. If we place all the decision tree models in consecutive order, then we can say that each subsequent model will try to reduce the errors of the previous decision tree model.
To understand the above statement, let us dive into the architecture of gradient boosting-:
Architecture of gradient boosting
Above is the architecture of a gradient boosting method, we can see that each decision tree is giving some prediction, but the question is that are they working on the same dataset? Because if they are working on the same dataset, they might be producing the same results which would mean there is no such benefit of these multiple predictions.
The dataset we are using is one, but the dataset is divided into sub-datasets, each dataset consists of the same amount of data points as of the original dataset, each sub dataset is fed to a decision tree model, and as every sub-data is different from each other, we tend to get different results.
(Must read: What is LightGBM Algorithm?)
How does Gradient Boosting Work?
Gradient boosting works in a very interesting way, as we have learned till now that gradient boosting contains multiple models but we don’t know the purpose of multiple models till now.
Basically every decision tree model learns something and gives some predictions, the prediction first decision tree makes is generally of no use to what we are trying to make our model learn, therefore there are chances of a huge amount of errors, an error in machine learning can be calculated by the difference between the actual data point and the predicted data point value.
These differences must be reduced in order to make an optimum machine learning predictive model, therefore the next decision tree takes the prediction from the previous decision tree model, and try to reduce the errors of the previous one, and also learns some features from its own dataset, this way with every consecutive decision tree, the result gets more refined.
Hence, the errors get significantly reduced at the end, and we tend to learn more features than what we would have with a single model, therefore, with one dataset and multiple models, we can achieve better results than one dataset and one model.
Models which do not produce great accuracy are considered to be weak learners, the intuition behind the boosting algorithm is to use these weak learners, and gradually increase their accuracy by reducing the errors.
The only problem one could think of here is of the speed, with multiple models, does it perform as fast as it should? The answer to the speed of the gradient boosting method is that it is quite average as long as the speed is concerned.
However, in 2014, Extreme Gradient boosting method was introduced which can be considered as up-gradation of the gradient boosting method, so what was the difference between GBM and XGBoost? Let’s discuss them in detail.
How is XGBoost Different from GBM?
Extreme gradient boosting is an up-gradation on the gradient boosting method, this method works parallelly and has a distributed system, the problem with GBM was that it was hard to scale, this problem is removed in XGBoost method as it is scalable and as far as speed is concerned, it is faster than the gradient boost.
Implementing Gradient Boosting With Python
import pandas as pd
import numpy as np
from sklearn.metrics import classification_report
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
First of all, we are importing certain important libraries that we need in order to move forward. The libraries we imported are pandas, numpy, gradient boosting classifier, breast cancer dataset, and classification report. train _test_split is imported in order to divide the dataset into training and testing parts.
df = pd.DataFrame(load_breast_cancer()['data'],
df['y'] = load_breast_cancer()['target']
Here, we are making use of the pandas library to work with dataframe, as it is easy to make changes. Afterwards, we are using .head() function to show the 5 rows and total columns.
X,y = df.drop('y',axis=1),df.y
test_size = 0.30 # taking 70:30 training and test set
seed = 7 # Random number seeding for reapeatability of the code
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=seed)
test_size and seed are explained within the code itself, train_test_split function is being used here to divide the dataset to training and testing part, this is relatively very easy than splitting the dataset manually.
gradient_booster = GradientBoostingClassifier(learning_rate=0.1)
In order to implement gradient boosting, we are using gradient boosting classifier which we imported from SKlearn, here learning rate is nothing but the steps taken by the model or the rate by which model learns, it ranges between 0 to 1 generally. To know more about learning rate, refer to this Cost Function article.
Here we are fitting our model with the training dataset, if the data is fitted properly, it will produce good accuracy.
In order to check the accuracy and the quality of accuracy report, we use this python library called classification_report().
We are also seeing the accuracy of 99% on this particular model.
(Suggested read: Machine learning tutorial)
Boosting is a method that is used in many competitive events, such as kaggle machine learning competitions. Many machine learning researchers use the gradient boosting or adaboost algorithm to improve the accuracy of the machine learning model.
(Must read: Machine learning tools)
Apart from boosting, researchers or programmers also use bagging methods, both methods can help in order to increase the overall efficiency of the model. Since the arrival, boosting has proven itself as one of the greatest innovations in the field of machine learning.