Machine Learning Coding Tutorial 3. What Makes a Good Feature?

Machine Learning Coding Tutorial 3. What Makes a Good Feature?

In the previous tutorial, we used decision tree as the classifier. Classifiers are only as good as the features you provide.

That means coming with good features is one of your most important jobs in machine learning.

1. Dog Classifier

Imagine we want to write a classifier to tell the difference between two types of dogs: greyhounds, and Labradors.

Here we’ll use one feature: the dog’s height in inches. Greyhounds are usually taller than Labradors.

2. Coding

Let’s head into Python for a programmatic example.

Create a python file dogs.py and write following code to program.

Please read comments carefully to understand the meaning of codes.

Run the program with the following command in Terminal (Mac) or Command Prompt (Windows):

You should see a popup window of a histogram.

3. Explanation

To the left of the histogram, the probability of dogs is to be Labradors. On the other hand, if we go all the way to the right of the histogram and we look at a dog who is 35 inches tall, we can be confident they are greyhound.

In the middle, the probability of each type of dog is close.

So height is a useful feature, but it’s not perfect. That’s why in machine learning, you almost need multiple features. Otherwise, we can just write if statement instead of bothering with the classifier.

Ideal features are

  • informative
  • independent
  • easy to understand

informative

For example, eye colors of dogs are useless to tell what type of dogs it is.

independent

For example, Height in inches and Height in centimeters are redundant

easy to understand

For example, to estimate the time to fly from a city to another. The distance between two cities is better than the longitude, latitude information of two cities.

Leave a Reply

Your email address will not be published. Required fields are marked *