Hello once again. Welcome to my blog. In the last post I introduced linear regression which is a powerful tool used to find the relationship between a response variable and one or more explanatory variables. In this post, I will demonstrate how to implement linear regression using a popular programming language – Python. To perform linear regression in Python I will make use of libraries. You can think of them as plug-ins that are used to add extra functionality to Python. The libraries I will be using are as follows:
i. Pandas (for loading data)
ii. Numpy (for arrays)
iii. Statsmodels.api & Statsmodels.formula.api (for linear regression)
iv. Matplotlib (for visualization)For this demonstration, I will use the King County House Sales data to predict the price (in dollars) of house using just one feature – square footage of the house. This dataset contains information about houses sold in King County (a region in Seattle). This dataset is public and can be accessed by anyone (I think a Google search should provide a link to where you can download it from). It’s in a CSV format (CSV stands for comma separated values). To load the dataset we use the Pandas library. Once we have loaded the dataset we can now use it to perform linear regression.