Welcome to Plain Data: ENSEMBLE CLASSIFIERS

Hello, welcome to my blog. Recently, I have been talking about two algorithms for classification namely logistic regression & decision trees. I also demonstrated how we can implement these algorithms using Python’s scikit-learn library.

Today, I want to talk about Ensemble classifiers. The fundamental idea behind ensemble classifiers is combining a set of classifiers to make one better classifier. Concretely, an ensemble classifier combines two or more classifiers (also called a weak learner or classifier) in order to make a stronger classifier (also called a strong learner or classifier).

OPERATION OF AN ENSEMBLE CLASSIFIER

The operation of an ensemble classifier can be summarized in the following steps:

Given input data, learn T weak classifiers from the data.
Aggregate the prediction from each weak classifier in order to make a prediction for a test data-point.

Let me explain this with an example. Remember from the previous post, I built a model to predict safe and unsafe loans using a single decision tree. The ensemble classifier approach is to learn, say, 10 decision trees from the dataset. Each of these trees is called a weak learner because individually they may not have very good accuracy. We then combine these trees in order to obtain a single strong classifier which will have good accuracy.

How do we combine a set of weak classifiers in order to make a strong classifier? One way to do this is to assign a weight to the prediction of each weak classifier based on how well we trust the predictions of that classifier (i.e. good classifiers will have high weights while bad classifiers will have low weights). Finally, we predict by the sign of the weighted sum of predictions of all weak classifiers i.e. if the weighted sum is positive we predict the positive class else we predict the negative class. Another way to do this is to make predictions based on the majority vote of predictions of the weak classifiers.

ENSEMBLE METHODS

In this section I will talk about two popular ensemble methods – Boosting (specifically AdaBoost) and Random Forests.

Boosting

AdaBoost or adaptive boosting is an algorithm that was introduced by Freund & Schapire in 1999. It generates weak learners by iteratively focusing on the hard-to-classify examples in the training data. An overview of its operation is given in the following paragraph.

We start by assigning the same weight to all points in the data. Next, we learn a classifier from these points. If a data-point is correctly classified its weight is reduced, if it is incorrectly classified its weight is increased. This means that correctly classified data-points will ‘disappear’ after some iterations leaving the misclassified examples from the previous iteration. Once we recompute the weights for all data-points we learn another classifier on the remaining data-points (i.e. the hard-to-classify examples). This procedure is repeated T times where T is the number of weak learners we want to generate. So if we want 10 weak learners we repeat the procedure 10 times. A summary of this algorithm is given below:

Start with same weights for all data-points.
Learn a classifier f_t from the weighted data-points.
For t = 1, ... , T

Compute coefficient ŵ_t for the classifier based on the weighted classification error of the classifier f_t.
Recompute weights for the data-points.
Normalize weights (this is necessary in order to avoid numerical instability).

The final combined model predicts:

The equation above says our prediction, ŷ, is the sign of the weighted sum of predictions for all the weak classifiers.

Although AdaBoost can be applied to nearly any kind of model, it most commonly used with decision trees. A very important advantage of boosting is that it is robust to over-fitting i.e. test-set performance is not very sensitive to the choice of T.

Impact of Boosting

It is extremely useful in the field of computer vision
It is used by most (nearly half of) winners of machine learning (ML) competitions
Most deployed machine learning systems use model ensembles

RANDOM FORESTS

This is another very popular ensemble method that has been successfully applied in various areas of machine learning due to its scalability, ease of use and good classification performance. This algorithm iteratively ‘grows’ a number of trees on random subsets of the data and aggregates the predictions of all the trees based on the majority vote. A summary of the Random Forests algorithm is given below:

Draw a random bootstrap sample size of n (randomly choose n samples from the training set with replacement).
Grow a decision tree from the bootstrap sample. At each node:

Randomly select d features without replacement
Split the node using the feature that provides the best split according to the objective function e.g. maximizing the information gain or classification accuracy.

Repeat the steps 1 to 2 k times.
Aggregate the prediction by each tree to assign the class label by majority vote.

Advantages of Random Forests

It is a general purpose model that performs well on most problems
It is capable of handling noisy or missing data; categorical or continuous features
It can be used for extremely large datasets.

One weakness of Random Forests is that may not be easily interpreted unlike a decision tree. It also may require some work to tune the model to the data.

SUMMARY

In this blog post, I introduced the concept of an ensemble classifier and I talked about the mode of operation of two popular ensemble methods – Boosting and Random Forests.

I announced in the last blog post that I created a Github repository that contains my implementation of various ML algorithms. I have added an implementation of k-nearest neighbours and I will add more soon.

Thank you for reading this post. I hoped you liked it. As always I look forward to your comments about this post or any other of my posts. If you have a question, comment about it and I will do my best to answer. Please subscribe to my blog posts if you have not done so. Wishing you a great and productive week. Cheers!!!

Welcome to Plain Data

Monday, 30 May 2016

ENSEMBLE CLASSIFIERS

No comments:

Post a Comment

Search This Blog

Blog Archive

About Me

Popular Posts

Translate