Machine Learning – Is Automated Machine Learning Truly Feasible?

algorithmsautomatic-algorithmsboostingmachine learningstacking

As I discover machine learning I see different interesting techniques such as:

  • automatically tune algorithms with techniques such as grid search,
  • get more accurate results through the combination of different algorithms of the same "type", that's boosting,
  • get more accurate results through the combination of different algorithms (but not the same type of algorithms), that's stacking,
  • and probably lots more I still have to discover…

My question is the following: there are all those pieces. But is it possible to put them together to make an algorithm that takes as input cleaned data and outputs good results by taking the best out of all techniques? (Of course it will probably be less efficient that a professional data scientist, but he will be better than me!) If yes, do you have sample codes or do you know frameworks that can do that?

EDIT : After some answers, it seems some narrowing has to be done.
Let's take an example, we have one column with categorical data, let's call it y and we want to predict it from numerical data X that is either dummies or real numerical data (height, temperature). We assume cleaning has been done previously. Are there existing algorithm that can take such data and output a prediction? (by testing multiple algorithms, tuning them, boosting, etc.) If yes, is it computationally efficient (are the calculations done in a reasonable time if we compare to normal algorithm), and do you have an example of code?

Best Answer

If you know beforehand what kind of data you will feed in ("these are monthly sales of CPGs, with prices and promotion markers, and I want a point forecast"), so you can tune your setup ahead of time, that will likely be possible and already done, see various "expert systems" for certain specific tasks.

If you are looking for something that can take any kind of data and do "something useful" with it ("ah, here I am supposed to recognize handwriting and output ZIP codes, and there I should do fraud detection, and this input file obviously is a credit scoring task"), no, I don't think that will happen in a long time.

Sorry for an opinion-based answer to what might well be closed as an opinion-based question.


EDIT to address the edited question:

we have one column with categorical data, let's call it $y$ and we want to predict it from numerical data $X$ that is either dummies or real numerical data

This sounds like something that Random Forests are actually pretty good at. Then again, a "general-purpose" algorithm like RFs will likely never beat an algorithm that was tuned to a particular type of $y$ known beforehand, e.g., handwritten digits, or credit default risks.