Friday, 18 March 2016


Hello, welcome to my blog. In this post, I want to talk about the Bias-Variance trade-off which is a very important topic in Machine Learning. Before I do that let me lay the foundation. In my post on Linear Regression, I said that the goal of a linear regression model is to find parameters for our linear regression line that minimize the error between our predictions and the actual observations.
This does not only apply to linear regression; in fact that’s the goal of the majority of machine learning techniques. We want to minimize the error between what we predict and what we actually observe. This leads us to the 3 sources of error:
  • Irreducible error
  • Bias error
  • Variance error
Irreducible error refers to error that exists due to noise in the data. This kind of error cannot be eliminated no matter how we try to optimize the parameters of our machine learning algorithm. This is because the function (i.e. the model) we use to estimate the relationship between the input variable(s) and the output is only an approximation of the true (target) function which is unknown. Therefore, we are always bound to have some error in our predictions. The next two types of error will be the major focus of this post. First, I will define two important terms:

Bias are the simplifying assumptions made by an algorithm in order to make the target function easier to learn. Machine Learning (ML) algorithms with a high bias are inflexible, simple and may have few parameters. ML algorithms with low bias are flexible, can be complex and potentially may have lots of parameters. Examples of high bias ML algorithms are linear regression and logistic regression.

Variance refers to the amount the estimate of the target function will change if a different data was used for training. Even though the estimation of the target function will change for a different dataset, we do not want the change to be too much for various datasets meaning that the model is able to capture the underlying pattern between the input and output variables. ML algorithms with high variance tend to be very flexible and are able to capture subtle relationships in the training data. Examples of high variance ML algorithms are decision trees, k-nearest neighbours and support vector machines.

Bias Error: If a model has high bias it will make prediction errors because it fits the relationship between the input and output variables using a simple estimate of the target function which is rarely the case in real-life. This simple estimate is bound to make errors if it is used for prediction. This problem is called underfitting. I like to think of this as a student who does not prepare well for an examination. Clearly, this student is bound to fail the examination.

Variance Error: A model has high variance might perform well on the training data because it is flexible enough to capture the relationship between the input and output variables. Good right? Not really, this is because if the model fits the training data too well it may perform poorly on future data which is obviously a problem. This problem is called overfitting. Using the same student analogy, a model with high variance is like a student who reads the study material too well to the point of cramming it. While he may pass a question based on the study material (training data), he will fail questions that do not originate from the study material which is likely for an examination (future data). Like the previous student he is bound to also fail the examination.

We can see a pattern here, algorithms with high bias have low variance while those with low bias have high variance. Concretely, as variance increases bias decreases and vice versa. This is why it’s called a trade-off because you rarely can have the best of both worlds which would be low bias and low variance.

How do we control bias and variance? Below are some ways of controlling bias and variance for some ML algorithms.
  •  Linear regression & Logistic regression: These algorithms have high bias and low variance but the trade-off can be altered using a technique called regularization. This is done by adding a tuning parameter λ to their respective cost functions. Increasing λ increases the bias (and reduces the variance); while decreasing λ reduces the bias (and increases the variance) of the model.
  • K-Nearest Neighbours: This algorithm has low bias and high variance. This can be adjusted by tuning k which dictates how many neighbours that will contribute to our prediction. Increasing k increases the bias and reduces the variance of the model.
  • Support Vector Machines: Like k-nearest neighbours, this algorithm also has low bias and high variance. This can be adjusted by tuning the C parameter that influences the number of violations of the margin allowed in the training data which increases the bias but decreases the variance. 
Now you know what the bias-variance trade-off is. For every problem you encounter in machine learning you will need to find a balance between these two. If you are interested in learning more about this I suggest reading Section 2.2.2 of the book An Introduction to Statistical Learning.

Hope you enjoyed reading this post. If you have any questions or suggestions please leave a comment and I will happy to attend to you. That is all for now. Have a wonderful weekend. Cheers!!!