Solved – How to calculate confidence and prediction bands for a linear regression using Mathematica

confidence intervalmathematicaregression

I need to calculate confidence and prediction bands for a linear regression that is in the general form of $y=ax$, where $a$ is the multiplier and $x$ the variable.

I found a code in
http://demonstrations.wolfram.com/ConfidenceAndPredictionBands/
, though it gives the confidence and prediction bands for $y=ax+b$ which is out of my interest.


This is the code I wrote based on @shabbychef's answer. Is it correct?

bands[list_, x_, type_, gamma_] := 
 Module[{a, nlm, z, xs, ys, alphahat, betahat, sigmahat2, n, xbar, 
   multiplier},
  xs = list[[All, 1]];
  ys = list[[All, 2]];
  alphahat = Mean[ys];
  nlm = NonlinearModelFit[list, a z, {a}, z, 
    ConfidenceLevel -> 1 - gamma];
  betahat = a /. nlm["BestFitParameters"];
  n = Length[list];
  xbar = Mean[xs];
  sigmahat2 = nlm["EstimatedVariance"];
  multiplier = 
   If[type == 1, 
    Sqrt[n*sigmahat2/(n - 2)*(1/n + (x - xbar)^2/ 
         Plus @@ ((xs - xbar)^2))], 
    Sqrt[n*sigmahat2/(n - 2)*(1 + 
        1/n + (x - xbar)^2/ Plus @@ ((xs - xbar)^2))]];
  {betahat*(x),
   betahat*(x) - 
    multiplier*Quantile[StudentTDistribution[n - 2], 1 - gamma/2], 
   betahat*(x) + 
    multiplier*Quantile[StudentTDistribution[n - 2], 1 - gamma/2]}
  ]

Best Answer

See the wikipedia page on 'mean and predicted response'. Your linear regression is simpler than the 'standard' one which includes an intercept term. I believe you will approximate the variance of the mean response as $$\mbox{Var}\left(\hat{\beta}x_d\right) \approx \hat{\sigma}^2 \frac{x_d^2}{\sum_i x_i^2} = V_d,$$ where $\hat{\beta}$ is your estimate of the regression coefficient, and $\hat{\sigma}^2$ is the estimated variance in the error term. Thus the $1-\alpha$ confidence interval for the mean response at $x_d$ is $$y_d \pm t_{1-\alpha/2,n-2} \sqrt{V_d},$$ where $y_d = \hat{\beta}x_d$, and $t_{1-\alpha/2,n-2}$ is the $1-\alpha/2$ quantile of a t-distribution with $n-2$ degrees of freedom. (actually, I am not quite sure that shouldn't be $n-1$ in this case.)

For the 'predicted' response, you have to add $\hat{\sigma}^2$ to the above quoted $V_d$. The difference between them is not terribly clear in the wikipedia article. I believe the idea here is that if you sample the relationship at the new point $x_d$, there is error in your estimate of $\beta$, but also an error term in the model, so you have to add the extra $\sigma^2$ to your variance estimate. Or as I have seen it put elsewhere: adding more observations gives you greater confidence in the location of the fit line, but there is still error in the model.

Related Question