Solved – Distance to a regression line, and degrees of freedom

degrees of freedomregression

How do you estimate degrees of freedoms for derived measurements?

I want to assess the significance of the distance of an independent data point to a regression line. I can easily calculate the (vertical) distance between the data point and the regression line, and I get the uncertainty of the distance from the uncertainties of slope and intercept of the linear regression via Gaussian error propagation. However, what are the degrees of freedom?

The linear regression line has been calculated from n data points, thus its degrees of freedom is n-2. The additional measurement is independent, so I get another degree of freedom, bringing the total to n-1?

Also, should I estimate the uncertainty of the independent measurement using the variance of the residuals of the regression, since the measurement process is the same for both the data that went into the fit and the independent data point? I guess this would reduce the degrees of freedom again?

Best Answer

There is a well established theory of prediction intervals in the context of linear regression. New values at $x=x_0$ have a normal distribution with mean $\alpha+\beta x_0$ (not surprisingly) and variance $\sigma^2\left(1+\frac{1}{n} + \frac{(x_0-\bar{x})^2}{\sum{(x_i-\bar{x})^2}}\right)$.

After plugging in the estimated versions of the parameters, the standardized distribution will be a $t$ distribution with $n-2$ degrees of freedom. That's because the estimate of $\sigma^2$ has that many degrees of freedom, and the df of the chi-squared term in the denominator drives the degrees of freedom.

Intuitively, you can think that you are not using the new data point for estimating anything, so you are not gaining any degrees of freedom.

Related Question