## Friday, 18 March 2016

### THE BIAS-VARIANCE TRADEOFF

Hello, welcome to my blog. In this post, I want to talk about the Bias-Variance trade-off which is a very important topic in Machine Learning. Before I do that let me lay the foundation. In my post on Linear Regression, I said that the goal of a linear regression model is to find parameters for our linear regression line that minimize the error between our predictions and the actual observations.
This does not only apply to linear regression; in fact that’s the goal of the majority of machine learning techniques. We want to minimize the error between what we predict and what we actually observe. This leads us to the 3 sources of error:
• Irreducible error
• Bias error
• Variance error
Irreducible error refers to error that exists due to noise in the data. This kind of error cannot be eliminated no matter how we try to optimize the parameters of our machine learning algorithm. This is because the function (i.e. the model) we use to estimate the relationship between the input variable(s) and the output is only an approximation of the true (target) function which is unknown. Therefore, we are always bound to have some error in our predictions. The next two types of error will be the major focus of this post. First, I will define two important terms:

BIAS
Bias are the simplifying assumptions made by an algorithm in order to make the target function easier to learn. Machine Learning (ML) algorithms with a high bias are inflexible, simple and may have few parameters. ML algorithms with low bias are flexible, can be complex and potentially may have lots of parameters. Examples of high bias ML algorithms are linear regression and logistic regression.

VARIANCE
Variance refers to the amount the estimate of the target function will change if a different data was used for training. Even though the estimation of the target function will change for a different dataset, we do not want the change to be too much for various datasets meaning that the model is able to capture the underlying pattern between the input and output variables. ML algorithms with high variance tend to be very flexible and are able to capture subtle relationships in the training data. Examples of high variance ML algorithms are decision trees, k-nearest neighbours and support vector machines.

HOW DO THESE LEAD TO ERROR?
Bias Error: If a model has high bias it will make prediction errors because it fits the relationship between the input and output variables using a simple estimate of the target function which is rarely the case in real-life. This simple estimate is bound to make errors if it is used for prediction. This problem is called underfitting. I like to think of this as a student who does not prepare well for an examination. Clearly, this student is bound to fail the examination.

Variance Error: A model has high variance might perform well on the training data because it is flexible enough to capture the relationship between the input and output variables. Good right? Not really, this is because if the model fits the training data too well it may perform poorly on future data which is obviously a problem. This problem is called overfitting. Using the same student analogy, a model with high variance is like a student who reads the study material too well to the point of cramming it. While he may pass a question based on the study material (training data), he will fail questions that do not originate from the study material which is likely for an examination (future data). Like the previous student he is bound to also fail the examination.