Thursday, 31 March 2016

CLASSIFICATION

Hello, welcome to my blog. In this post I am going to talk about another popular application of machine learning – classification. First, let me define classification. It is the allocation (or organization) of items into groups (or categories) according to type. In the context of machine learning, classification is using the features of an item to predict what class (out of a two or more classes) it belongs to. It is one of the fundamentals tools of machine learning.

Let me give a practical example. If someone notices a lump on his/her body it could be a sign of cancer. When he/she goes to the hospital, a biopsy is conducted (a sample of the tumour is removed from the patient). I don’t know the specifics of testing if a tumour is cancerous or not but I think a likely approach could be to use various features (measurements) of the tumour sample to classify it as malignant (cancerous) or benign (not cancerous).

This does not just apply to ‘cancer prediction’ but to also any other classification problem you can think of. Take the problem of recognizing hand-written digits. A similar procedure is also followed where by features (or attributes) of a hand-written digit is used to predict what number it is. Classification is kind of like regression with the difference being that we are trying to predict a class instead of a number. Let me talk about some popular applications of classification.

Classifiers have a lot of interesting applications in the real world. Here are a few of them
  • Predicting appropriate ads for a webpage: In this case, the goal is to use features of a webpage such as its text to classify it as belonging to either education, finance, technology, sports, lifestyle etc. Once the class or category of the webpage is determined, appropriate ads can now be displayed on the webpage.
  • Spam filtering: This is perhaps the most famous and common application of classifiers. The goal in this case is to use features of an incoming email such as text, sender’s name, sender IP etc. to classify the email as spam or not spam. Early spam filters used just keywords to classify an email as spam or not spam but modern spam filters are now very sophisticated such that they now much better at filtering out spam messages. This is probably why I hardly check my spam folder.
  • Image classification: Given an image the goal is to predict what the image is e.g. a dog, car, cat or a human face using the pixels of the image. In fact, we may even want to be more specific. We may want to know what kind of dog it is (if it is the image of a dog) e.g. Labrador, golden retriever etc.
  • Personalized medicine: This is potentially another powerful application of classification. Here, ‘features’ of an individual such as his/her DNA sequence, lifestyle of the individual etc. is used to predict the most effective course of treatment for that person. This ensures people are not all given the kind of treatment because as individuals we are different and may work for me may not work for (or even be harmful to) you.
  • Fraud detection: This is another useful application which is used mainly in the financial sector to determine if a series of transactions is an anomaly. For example, if a person rarely spends above ₦100,000 on his credit card and his card is stolen. If the thief consistently spends above the usual ₦100,000 with that credit card this can be classified as an anomaly because it is an irregular behaviour for that person. Classifiers are used to spot or detect irregular transactions which in most cases are fraudulent.
Now that you know some applications of classifiers let me mention some popular machine learning algorithms that are used for classification. They are

  1. Logistic regression
  2. K-nearest neighbours
  3. Neural networks
  4. Support Vector Machines (SVMs)
  5. Decision trees
I will talk about some of these algorithms in subsequent posts and also show how to implement them in Python and R. I will also talk about metrics which can help you determine how well a classifier is doing on data.

Conclusion
In this post I defined the concept of classification and I also mentioned some popular applications of classification in the real world. I ended by listing the popular algorithms used for classification.

Thank you once again for reading my blog. I specially want to thank everyone that has given me positive feedback about this blog. You encourage me to keep doing this. As always if you have any questions, comments or suggestions don’t hesitate to leave a comment. I will do my best to attend to you. Cheers!!!