Solved – Test whether two predictions are significantly different

predictionr

I am currently stuck with a problem regarding predictions from linear regressions. I estimated a simple (multivariate) regression model y = b0 + b1 * x + b2 * X, where x is my variable of interest and X is a matrix of controls, using the lm() fct in R.

Now I want to predict y for two different values of x. Finally, I want to know whether those two predictions of y are statistically different from each other.

So far, I used predict(model, se.fit = TRUE, interval = "prediction") and got a point prediction as well as the corresponding prediction interval. Using the prediction intervals of the two points, I decided whether there are statistically different based on the overlapping of the prediction intervals.

I got almost no significant differences using this technique even when the estimated coefficients are significant. Is this the right track, or are there different techniques one can use?

Thanks for your help!

Best Answer

I'm not sure why you want this, or if I really understand what you want; however predict() will give you what I think you want. It makes the assumption that future values will have the same variance of the data used to construct your model. You can change that, but let's assume you think that is good assumption. If you change your interval to confidence, rather than prediction, you will get upper and lower bounds for for a 95% confidence interval that the true value of the mean for your observation is within the interval. Prediction interval is not what you want. Prediction intervals give you 95% confidence interval that future values will fall within that interval (an overview of different intervals). You don't care about future values. You care about the values you are comparing.

So, comparing intervals from the following function, predict(model, se.fit = TRUE, interval = "confidence"), will tell you whether you have sufficient confidence that the true value of the mean of an observation is greater than the true value of the mean of another observation.

I used some stylized verbiage, because, again, I am not sure exactly what you want. If you want to know whether one value is greater than another value, then you don't need statistics. It either is or it isn't. If someone got an A on a test, and someone else got a B, you can't ask if the first person statistically got a better grade. They just plain got a better grade. If you have more data points though, you can estimate if the first person, on average, does better on tests than the second person.

I think a key distinction to make is that your model does not predict values, per se. It predict means. What your model gives you is an estimate of the mean for a random variable $y$ for a person with certain $x$ and $\mathbf X$ values. One way to think of it is that there could be multiple people with those same $x$ and $\mathbf X$ values, but you wouldn't necessarily get the same value (hence the error component). While this adds complexity that isn't in your model, you also could imagine measuring the same person at different intervals. Even if the $x$ and $\mathbf X$ variables didn't change, you would still imagine the measured DV value would be different. So you aren't really saying that predicted value 1 is larger than predicted value 2. You are saying something to the extent of the mean for all people with certain $x$ and $\mathbf X$ values is greater or less than the mean for all people with certain other $x$ and $\mathbf X$ values, with 95% confidence.

P.S. I say "people" because I study people. If you study plants or rocks or plots of farmland, just insert those words where I put people.

Related Question