Solved – General assumptions about functions in machine learning

assumptionsmachine learning

In the article "A few useful things to know about machine learning" (ungated pdf), I found the following quote:

In fact, the general assumptions, like smoothness, similar examples have similar classes, limited dependencies, limited complexity, are often enough to do very well, and this is part of the reason why machine learning has been so successful.

I understand the assumption that similar examples should be able to have similar classes.

I am guessing

"limited dependencies" means that the optimization function only depends on a finite number of variables,
"limited complexity" means that the problem's complexity is limited, and
"smoothness" means the optimization function is smooth in some sense.

Are those guesses right? Can someone clarify these for me?

Best Answer

I agree with spicysheep that the passage is a little vague, but without reading the paper here's what I think those assumptions refer to:

Smoothness: the "true model" is smooth. That is, the underlying function you're trying to learn that maps from input features to labels isn't too "steep" anywhere. A formal version of this assumption might be something like assuming a Lipschitz constant. I'm sure what you mean by "optimization function", but if you mean the loss function that you optimize when you learn a model, that's not quite the same assumption.
Similar examples have similar labels: this is more or less the same thing as smoothness, I think, just phrased slightly differently.
Limited dependendicies: most things just don't affect each other very much. The extreme version of this is a naive Bayes model, where everything is assumed to be independent of everything else given the class. Other models make less stringent independence assumptions; graphical models provide an explicit way to represent and reason about these assumptions, where the "limited dependencies" assumption corresponds to the graph not being too densely connected.
Limited complexity: you assume the model just isn't crazily complex and can be represented fairly simply: Occam's razor. This is often implemented in objective functions via regularization, which from a Bayesian viewpoint often corresponds to priors.

These assumptions can all be viewed kind of as kind of the same thing.

Best Answer

Related Solutions

Solved – What are basic differences between Kernel Approaches to Unsupervised and Supervised Machine Learning

Solved – State-of-the-art ensemble learning algorithm in pattern recognition tasks

Related Question