Solved – it all about Machine Learning in real practice

algorithmsmachine learning

I'm a newcomer in Machine learning (also some statistics), been learning knowledge (supervised/unsupervised learning algorithms, relevant optimization methods, regularizations, some philosophies (such as bias-variance trade-off?)) for a while. I know that without any real practice, I would not gain deep understanding of those machine learning stuff.

So I begin with some classification problem with real data, say handwritten digit classification (MNIST). To my surprise, without any feature learning/engineering, the accuracy reaches 0.97 using random-forest classifier with raw pixel values as input. I also tried other learning algorithms, such as SVM, LR with parameters being tuned.

Then I got lost, would it be too easy or am I missing anything here? Just pick up a learning algorithm from the toolkit and tune some parameters?

If that would be all about machine learning in practice, then I would be losing my interest in this field. I thought and read some blogs for a few days, and I came to some conclusions:

  1. The most important part of machine learning in practice is feature engineering, that is, given the data, find out better representation of features.

  2. Which learning algorithm to use is also important, also the parameter tuning, but the final choice is more about experimentation.

I'm not sure I understand it correctly, hoping anyone can correct me and give me some suggestion about machine learning in practice.

Best Answer

Machine learning (ML) in practice depends on what the goal of doing ML is. In some situations, solid pre-processing and applying a suite of out-of-the-box ML methods might be good enough. However, even in these situations, it is important to understand how the methods work in order to be able to troubleshoot when things go wrong. However, ML in practice can be much more than this, and MNIST is a good example of why.

It's deceptively easy to get 'good' performance on the MNIST dataset. For example, according to Yann Le Cun's website on MNIST performance, K nearest neighbours (K-NN) with the Euclidean distance metric (L2) also has an error rate of 3%, the same as your out-of-the-box random forest. L2 K-NN is about as simple as an ML algorithm gets. On the other hand, Yann, Yoshua, Leon & Patrick's best, first shot at this dataset, LeNet-4, has an error rate of 0.7%, 0.7% is less than a fourth of 3%, so if you put this system into practice reading handwritten digits, the naive algorithm requires four times as much human effort to fix its errors.

The convolutional neural network that Yann and colleagues used is matched to the task but I wouldn't call this 'feature engineering', so much as making an effort to understand the data and encode that understanding into the learning algorithm.

So, what are the lessons:

  1. It is easy to reach the naive performance baseline using an out-of-the-box method and good preprocessing. You should always do this, so that you know where the baseline is and whether or not this performance level is good enough for your requirements. Beware though, often out-of-the-box ML methods are 'brittle' i.e., surprisingly sensitive to the pre-processing. Once you've trained all the out-of-the-box methods, it's almost always a good idea to try bagging them.
  2. Hard problems require either domain-specific knowledge or a lot more data or both to solve. Feature engineering means using domain-specific knowledge to help the ML algorithm. However, if you have enough data, an algorithm (or approach) that can take advantage of that data to learn complex features, and an expert applying this algorithm then you can sometimes forego this knowledge (e.g. the Kaggle Merck challenge). Also, sometimes domain experts are wrong about what good features are; so more data and ML expertise is always helpful.
  3. Consider error rate not accuracy. An ML methods with 99% accuracy makes half the errors that one with 98% accuracy does; sometimes this is important.