Solved – Difference between inferential and predictive statistics (In one of the Coursera Datascience toolbox lectures)

inferenceprediction

I think I understood nuts and bolts about the difference about inferential and predictive statistics, but it got me confused after looking at one of the lectures on data science @ coursera. I copypasted the part where got me confused.

This excerpt came from experimental design lecture of Datascience Toolbox.
The image the lecturer is referring to is attached below.

image

So for a prediction study, you actually have slightly different issues that come up from the inferential case. So again, you might have a population of individuals, and you might not be able to collect the data on all those individuals but you want to predict something about them. So for example, we might get individuals that come in and we measure something about their genome, and we want to predict whether they should get whether they're going to respond to chemotherapy or not.

So what we do is, we collect these individuals and we might collect observations from people that did respond to chemotherapy, and did not respond to chemoptherapy. And then what we want to do is build a predictive function so that if we get a new individual, we can predict whether they're going to respond to chemotherapy up here, as an orange person, or not respond down here as a green person. So the idea here is that we'll still need to deal with probability and sampling and potential confounding variables, because, when we're building this prediction function, we want it to be highly accurate.

Another issue that comes up is that, prediction is slightly more challenging than inference. So for example, if you look at these two populations, the, the population for the light grey curve has a mean, value about here, and the population for the dark grey curve has a mean value about here. If you look at the distribution of observations from these two populations, and so what you can see is, there's a difference in the P value of these two populations. But if I tell you, for example, that I've observed the value that comes, say, right here. It's very difficult to know which of these two populations it came from, because it's relatively likely that it came from the light gray population. But it's also relatively likely that it came from the dark gray population, so it's very difficult to tell the difference. For prediction, you actually need the distributions to be a little bit more separated, so these also, these two distributions also have a different mean, but now, they're far enough apart based on their variability. That if I give you an observation that lands right about here, you know it probably came from the dark grid population. Where if I give you an observation here, it probably came from a light grid population. So it's important to pay attention to the relative size of effects when considering prediction versus inference.

I have commonly seen such slide when people explain the mechanism of inferential statistics but am not familiar with using such image to explain the difference between prediction and inference. What is the lecturer getting at in terms of the difference between prediction versus inference by using the slide?

Best Answer

I hope the author of that text is a contributor to this site, because I am about to argue that they make a fundamental error, and I would like it if they were around to defend themselves.

And then what we want to do is build a predictive function so that if we get a new individual, we can predict whether they're going to respond to chemotherapy up here, as an orange person, or not respond down here as a green person.

This is subtly wrong, this is not what what we want to do. Our goal in such a study is to develop a decision rule that will advise us on how to act when presented with a case. That is, our decision rule should tell us whether we should apply the therapy to a case. This is related, but not equivalent, to prediction of whether they will respond, as I will elaborate on below.

The correct procedure for developing such a rule does involve prediction:

  • Develop a model that predicts the probability that an individual will respond to treatment.
  • Use the model, along with an understanding of the benefits and costs of treatment, to develop a decision rule that advises doctors on procedure.

In many problems the benefits and costs change quickly in response to our understanding of the situation, or outside influences like legislation, or new technology. If we follow the above procedure, only the decision rule has to adapt to these changes, the modeled probabilities are invariant. They only express our underlying scientific knowledge about the treatment and its affects. This is a separation of concerns, which engineers have long known is a powerful tool in organizing work.

It is important that our model predicts probabilities. This is what allows us to incorporate information about the benefits and costs into our decision rule. We can calculate the expected value and costs of treatment for an individual, and balance them according to our goals. If instead, we insist on the model telling us "responds" or "not responds" we have given up our power to make nuanced decisions based on these benefits and costs, and have ceded our ability to adapt to an ever changing landscape.

The author falls into this trap. In the picture of overlapping distributions they argue that prediction is difficult because in the regions of large overlap, the model can not meaningfully make a binary yes or no call on "responds to treatment". This is simply the truth about most situations we encounter in life. This is why it is important to base our reasoning on probabilities. Probabilities actually quantify the degree of uncertainty we have in making a yes or no call. In the overlapping distributions, there is no difficulty at all in assigning probabilities to "responds to treatment". It is only when we ignore this reality, and attempt to say with certainty what will happen that issues arise. The authors difficulty is manufactured out of their own incorrect procedure.

Another issue that comes up is that, prediction is slightly more challenging than inference.

This is not generally the opinion of most literature or the wise people I have discussed these issues with. I wonder if the the author is using some quirky definition of "prediction" and "inference".

To me, inference is using modeling to understand the true mechanisms that underlie a phenomena. We want to be able to say things like "increasing the treatment drug by xccs will lead to an improvement in outcomes by y amount". To do inference, we first need a model that describes the phenomena well (the gold standard would be our ability to use the model to make predictions). We then use the shape of that model to distill understanding about what is going on.

In prediction, we don't much care about the model being introspectable. If it is too complicated for us to understand, so be it, as long as its predictions are accurate. Prediction studies loosen some of the constraints we must meet to use a model for inference. The author seems to have it backwards.

A most excellent reference that is quite readable and really helped me clarify my thinking on this subject is Shmueli: To Explain or Predict.