Solved – the difference between learning and inference

machine learningterminology

Machine learning research papers often treat learning and inference as two separate tasks, but it is not quite clear to me what the distinction is. In this book for example they use Bayesian statistics for both kinds of tasks, but do not provide a motivation for that distinction. I have several vague ideas what it could be about, but I would like to see a solid definition and perhaps also rebuttals or extensions of my ideas:

  • The difference between inferring the values of latent variables for a certain data point, and learning a suitable model for the data.
  • The difference between extracting variances (inference) and learning the invariances so as to be able to extract variances (by learning the dynamics of the input space/process/world).
  • The neuroscientific analogy might be short-term potentiation/depression (memory traces) vs long-term potentiation/depression.

Best Answer

I agree with Neil G's answer, but perhaps this alternative phrasing also helps:

Consider the setting of a simple Gaussian mixture model. Here we can think of the model parameters as the set of Gaussian components of the mixture model (each of their means and variances, and each one's weight in the mixture).

Given a set of model parameters, inference is the problem of identifying which component was likely to have generated a single given example, usually in the form of a "responsibility" for each component. Here, the latent variables are just the single identifier for which component generated the given vector, and we are inferring which component that was likely to have been. (In this case, inference is simple, though in more complex models it becomes quite complicated.)

Learning is the process of, given a set of samples from the model, identifying the model parameters (or a distribution over model parameters) that best fit the data given: choosing the Gaussians' means, variances, and weightings.

The Expectation-Maximization learning algorithm can be thought of as performing inference for the training set, then learning the best parameters given that inference, then repeating. Inference is often used in the learning process in this way, but it is also of independent interest, e.g. to choose which component generated a given data point in a Gaussian mixture model, to decide on the most likely hidden state in a hidden Markov model, to impute missing values in a more general graphical model, ....