Solved – Significant difference from regression confidence intervals

confidence intervalregressionstatistical significancetreatment-effect

I have a question about statistical significance in relation to confidence intervals from linear regression. I'm obviously far from a stats expert, and I've been searching for the answer to this, probably simple, question for a while now without any luck.

I've made an example to clarify my question:
I'm interested in looking at the treatment effect of doing a change (e.g. spraying with pesticide) on one area, and use an untreated area for control. Before the treatment a "calibration line" is established between the two areas as the correlation between some observed response (e.g. annual crop yield, white open circles below, with regression line and 95% confidence intervals drawn)

Link to example plot: (sorry, not rep to post image)http://imgur.com/M95dqEk

[Fig: X and Y axis show the same response (e.g. annual crop yield), but for the control area(X axis) and for the treatment area (Y axis). Each data point is then the annual crop yield for both the control and the treatment area, a total of 10 years of data]

After the treatment this response is measured again (red open triangles), and the "treatment effect" is defined as the difference between the observed response in the treatment and the predicted response (the regression line).

My question is if you can say that the treatment effect is statistically significant if the data point is outside of the 95% confidence interval of the calibration regression? And why/why not ? (so in the plot example 3 of the observed treatment effects are significantly different from the predicted response to a 95% confidence level (p=0.05)?)

Thanks

Edit1, additional question:
Would prediction intervals, instead of confidence intervals, be more suitable to describe whether there has been a change in the relationship between the two areas after treatment?

Edit2:
Would it be right so say that the confidence intervals can be used to check if the mean of the treatment effects are significantly different (and, as @Glen_b suggests, use the regression line/confidence line for the treatment points instead of single points.
But when talking about whether a single sample is significantly different (as in my comment below to Glen_b) it is better to use the prediction interval?

Best Answer

You don't compare the individual points to conclude a treatment effect. You see whether the lines for the treatment and control are different.

In some circumstances, the fitted lines might be parallel, and just the difference in intercept is of interest. In others, both the intercept and slope might differ, and any difference would be of interest.

Testing point vs line in ordinary regression (not errors-in-variables, which is more complicated):

It's not correct to check if data values for another are in the confidence interval because the data values themselves have noise.

Call the first sample $(\underline{x}_1,\underline{y}_1)$, and the second one $(\underline{x}_2,\underline{y}_2)$. Your model for the first sample is $y_1(i) = \alpha_1 + \beta_1 x_{1,i} + \varepsilon_i$, with the usual iid $N(0,\sigma^2)$ assumption on the errors.

You want to see if a particular point $(x_{2,j},y_{2,j})$ is consistent with the first sample. Equivalently, to check whether an interval for $y_{2,j} - \left(\alpha_1 + \beta_1 x_{2,j}\right)$ includes 0 (notice the points are second-sample, the line is first-sample).

The usual way to obtain such CI would to construct a pivotal quantity, though one could simulate or boostrap as well.

However, since in this illustration we're doing it for a single point, under normal assumptions and with ordinary regression conditions, we can save some effort: this is a solved problem. It corresponds to (assuming sample 1 and sample 2 have a common population variance) checking whether one of the sample 2 observations lies within a prediction interval based on sample 1, rather than a confidence interval.

Related Question