Sunday, 14 February 2016


Hello, welcome to my blog. In my previous posts, I have been talking about linear regression which is a technique used to find the relationship between one or more explanatory variables (also called independent variable) and a response variable (also called dependent variable) using a straight line. Furthermore, I said that when we have more than one explanatory variable it is called multiple linear regression. Finally, I also implemented both types of regression using Python.

As a roundup I will just mention some precautions that should be taken when applying linear regression. Here are some tips to remember:

  1.  Make sure the relationship between both variables is linear. This is very important because using linear regression to explain the association between variables that do not have a linear relationship is just plain wrong. This is because (as you can remember) linear regression tries to use a straight line to fit the data. If the relationship between both variables is better explained by a curved line, linear regression will produce horrible results. This can be taken care of using polynomial regression (which I will talk about in another post).
  2. Another very important tip to remember – the fact that you see an association between two variables from your regression analysis does not mean a change in the independent variable causes a change in the dependent variable. This may seem counter-intuitive but allow me to clarify. For example, the consumption of ice-cream (pints per person) and the number of murders in New York are positively correlated. That is, consumption of ice cream increases with the number of murders in New York. Strange but true! This is because regression will show a positive association between variables that are generally increasing independent of each other. So a key thing a take away is that correlation does not imply causation. Using our ice-cream example, we can say ice-cream sold in New York is positively correlated with murders in New York but it would be wrong to say an increase in ice-cream consumption causes more murders in New York (or vice versa).
  3. Also beware of confounding variables – variables that are associated with both the explanatory and response variable. Let me give another example, research has shown (in America) that the number of cars a family has is positively associated with the SAT scores for children from that family. Now while this may seem true, it leaves out a very important variable – family income. This is because children from higher income families tend to do better in the SAT than children from lower income families. This is because they can afford things like textbooks, after-school tutors and generally resources that will help them pass and get higher scores in the SAT compared to children from lower income families. Also, the more a family earns, the more cars they can afford to buy. So you see that in this example, family income is a confounding variable because it is associated with both our explanatory and response variables. When carrying out regression analysis (especially multiple regression) make sure confounding variables are well accounted for before you jump to conclusions.
  4. Finally, another important thing to remember – Do not extrapolate beyond your data; always keep your interpretation of your model in context. Another example, in the previous post, we generated a regression model to predict the prices of houses in King County. Imagine if we tried to use that model to predict the price of houses in Abuja. This would be a big mistake because the model was not generated from data gotten from the sale of houses in Abuja and thus our model would perform very poorly.
Hope this tips help you in to make sensible interpretations using your linear regression model. The tips given above is by no means exhaustive. If you want to read more on this I recommend Chapter 12 of this book – Naked Statistics by Charles Whelan.

This is the end of the post. Hope you liked it. If you didn’t leave a comment suggesting how I can make it better. If you need any extra clarification on what I have just said leave a comment. I will be happy to answer any questions you have. Have a wonderful week ahead. Cheers!!!