## Monday, 4 April 2016

### LINEAR CLASSIFIERS

Hello, welcome to my blog. In the previous post I talked about classification and its popular applications in the real world. I also listed the popular algorithms used for classification. Now I want to introduce the concept of a linear classifier. For this post, I will use a restaurant review system as a practical case study to make the concept of linear classifiers clear.

The goal of this system will be to classify the sentiment of sentences from different reviews as either positive or negative.

INTUITION BEHIND LINEAR CLASSIFIERS
A linear classifier takes as input some quantity x (in this case, a sentence from a review); it feeds it through its classifier model and makes a prediction ŷ. More formally, a linear classifier uses the training data to learn a weight (or coefficient) for each feature of the training data using its classifier model. These weights are then used for making predictions on the test data. For our case study, the features could be the words of a sentence from a review.

The weight (or coefficient) of a word tells us how positively or negatively influential that word (feature) affects the overall sentiment of a review. So words like ‘good’, ‘great’, ‘awesome’ may end up having positive weights while words like ‘bad’, ‘terrible’, ‘awful’ may end up having negative weights. This means that if a review has more positive words than negative words it is likely to have a positive sentiment (i.e. a positive review) and if it has more negative words than positive words it is likely to have a negative sentiment (i.e. a negative review). Furthermore, some words that are not relevant to the sentiment of a review may have zero (or close to zero) coefficient e.g. ‘is’, ‘the’, ‘we’ etc.

HOW TO MAKE PREDICTIONS USING THE COEFFICIENTS
Linear classifiers use the coefficients for each word to compute a score for each sentence. The score is a weighted count of the words in the sentence. For example, let’s assume we have learnt the following the coefficients for the words below:

 Word Coefficient Good 1.0 Great 1.3 Awesome 2.5 Bad -1.0 Terrible -2.4 Awful -3.4

Let’s use the coefficients above to compute the scores for the two sentences below:

Sentence(x1): The jollof rice was great, the chicken was awesome but the service was terrible.
Score(x1) = 1.3 + 2.5 – 2.4 = 1.4. Since this score is greater than zero, we will predict that this is a positive review. Therefore ŷ = +1.

Sentence(x2): The pounded yam was bad, the egusi soup was terrible, but the service was good.
Score(x2) = -1.0 – 2.4 + 1.0 = -2.4. Since this score is less than zero, we will predict that this sentence is a negative review. Therefore ŷ = -1.

Note that if a word appears more than once we just multiply the number of times the word appeared by its coefficient when computing the score. Take the sentence below:

Score(x3) = 2*1.3 – 1.0 = 1.6. This sentence would also be a positive review since the score for the sentence is greater than 1.

DECISION BOUNDARIES
A decision boundary is a line that separates that the positive predictions from the negative predictions. Any point below this line has score(x) > 0 which implies that ŷ = +1 and any point above the line has score(x) < 0 which implies that ŷ = -1. For linear classifiers, decision boundaries exhibit the following behaviour:
1. If there are 2 non-zero coefficients, the decision boundary will be a LINE.
2. If there are 3 non-zero coefficients, the decision boundary will be a PLANE.
3. If there are many non-zero coefficients (4 or more) the decision boundary will be a HYPER-PLANE.
An example of a decision boundary for a dataset is shown below:

The blue line represents the decision boundary for the data set above because it separates the circles from the crosses. Although some points are misclassified, it does a pretty good job of separating the different data points.

It is important to note that the choice of coefficients for the features can affect the position of the decision boundary.

Summary
In this post, I introduced the concept of a linear classifier using a restaurant review system as a practical case study. I also talked about how to use the coefficients of a word to compute the score for a sentence which in turn helps us make predictions concerning the sentiment of the sentence. One thing I did not talk about is how to compute the coefficients for the words in a review. In the next post, I will talk about that.