Hello, welcome to my blog. In the previous post
I talked about classification and its popular applications in the real world. I
also listed the popular algorithms used for classification. Now I want to
introduce the concept of a linear classifier. For this post, I will use a restaurant
review system as a practical case study to make the concept of linear
classifiers clear.
The goal of this
system will be to classify the sentiment of sentences from different reviews as
either positive or negative.
INTUITION BEHIND
LINEAR CLASSIFIERS
A linear classifier
takes as input some quantity x (in
this case, a sentence from a review); it feeds it through its classifier model
and makes a prediction ŷ. More formally,
a linear classifier uses the training data to learn a weight (or coefficient)
for each feature of the training data using its classifier model. These weights
are then used for making predictions on the test data. For our case study, the
features could be the words of a sentence from a review.
The weight (or coefficient)
of a word tells us how positively or negatively influential that word (feature)
affects the overall sentiment of a review. So words like ‘good’, ‘great’,
‘awesome’ may end up having positive weights while words like ‘bad’,
‘terrible’, ‘awful’ may end up having negative weights. This means that if a
review has more positive words than negative words it is likely to have a
positive sentiment (i.e. a positive review) and if it has more negative words
than positive words it is likely to have a negative sentiment (i.e. a negative
review). Furthermore, some words that are not relevant to the sentiment of a
review may have zero (or close to zero) coefficient e.g. ‘is’, ‘the’, ‘we’ etc.
HOW TO MAKE
PREDICTIONS USING THE COEFFICIENTS
Linear classifiers use
the coefficients for each word to compute a score for each sentence. The score
is a weighted count of the words in the sentence. For example, let’s assume we
have learnt the following the coefficients for the words below:
Word
|
Coefficient
|
Good
|
1.0
|
Great
|
1.3
|
Awesome
|
2.5
|
Bad
|
-1.0
|
Terrible
|
-2.4
|
Awful
|
-3.4
|
Let’s use the coefficients
above to compute the scores for the two sentences below:
Sentence(x1):
The jollof rice was great, the chicken was awesome but the
service was terrible.
Score(x1) =
1.3 + 2.5 – 2.4 = 1.4. Since this score is greater than zero, we will predict
that this is a positive review. Therefore ŷ
= +1.
Sentence(x2):
The pounded yam was bad, the egusi soup was terrible, but the
service was good.
Score(x2) =
-1.0 – 2.4 + 1.0 = -2.4. Since this score is less than zero, we will predict
that this sentence is a negative review. Therefore ŷ = -1.
Note that if a word appears more than once we just multiply
the number of times the word appeared by its coefficient when computing the
score. Take the sentence below:
Sentence(x3):
The restaurant had a great salad and a great ambience but the
service was bad.
Score(x3) =
2*1.3 – 1.0 = 1.6. This sentence would also be a positive review since the
score for the sentence is greater than 1.
DECISION BOUNDARIES
A decision boundary is
a line that separates that the positive predictions from the negative
predictions. Any point below this line has score(x) > 0 which implies that ŷ
= +1 and any point above the line has score(x)
< 0 which implies that ŷ = -1. For
linear classifiers, decision boundaries exhibit the following behaviour:
- If there are 2 non-zero coefficients, the decision boundary will be a LINE.
- If there are 3 non-zero coefficients, the decision boundary will be a PLANE.
- If there are many non-zero coefficients (4 or more) the decision boundary will be a HYPER-PLANE.
The blue line represents the decision boundary for the data set above because it separates the circles from the crosses. Although some points are misclassified, it does a pretty good job of separating the different data points.
It is important to
note that the choice of coefficients for the features can affect the position
of the decision boundary.
Summary
In this post, I
introduced the concept of a linear classifier using a restaurant review system
as a practical case study. I also talked about how to use the coefficients of a
word to compute the score for a sentence which in turn helps us make
predictions concerning the sentiment of the sentence. One thing I did not talk
about is how to compute the coefficients for the words in a review. In the next
post, I will talk about that.
Thank you once again
for reading my blog. I encourage you to add your email address to the mailing
list of this blog so you can quickly read my posts as soon as I publish them. Feel
free to leave a comment below about a question, comment or suggestion you have.
Have a wonderful week ahead.
No comments:
Post a Comment