Solved – the difference between prediction and inference

causalitypredictionterminology

I'm reading through "An Introduction to Statistical Learning" . In chapter 2, they discuss the reason for estimating a function $f$.

2.1.1 Why Estimate $f$?

There are two main reasons we may wish to estimate f : prediction and inference. We discuss each in turn.

I've read it over a few times, but I'm still partly unclear on the difference between prediction and inference. Could someone provide a (practical) example of the differences?

Best Answer

Inference: Given a set of data you want to infer how the output is generated as a function of the data.

Prediction: Given a new measurement, you want to use an existing data set to build a model that reliably chooses the correct identifier from a set of outcomes.


Inference: You want to find out what the effect of Age, Passenger Class and, Gender has on surviving the Titanic Disaster. You can put up a logistic regression and infer the effect each passenger characteristic has on survival rates.

Prediction: Given some information on a Titanic passenger, you want to choose from the set $\{\text{lives}, \text{dies}\}$ and be correct as often as possible. (See bias-variance tradeoff for prediction in case you wonder how to be correct as often as possible.)


Prediction doesn't revolve around establishing the most accurate relation between the input and the output, accurate prediction cares about putting new observations into the right class as often as possible.

So the 'practical example' crudely boils down to the following difference: Given a set of passenger data for a single passenger the inference approach gives you a probability of surviving, the classifier gives you a choice between lives or dies.

Tuning classifiers is a very interesting and crucial topic in the same way that correctly interpreting p-values and confidence intervals is.