Solved – How to calculate the confidence interval of a function of a combination of two linear models

confidence intervalrregression

We have two linear fits, one for each data-set (unfortunately they include weights but I'm willing to ignore that if there's a nice analytic solution to this). Data-set d1 and d2 are different in the sense that x is the same variable but includes different discrete points (so x in d1 might be 100,120,500,… and x in d2 might be 110,110,335,…), and their y's are not the same. So in R jargon:

f1 <- lm(y~x,weights=w,data=d1)
f2 <- lm(y~x,weights=w,data=d2)

I then want to take the trend line of each of these fits and use it in the following equation:

z = 1 / exp( 0.02254*f1^2 / f2^2 )

I did that and plotted out this z, which is dandy. But! using the predict function in R, I managed to get the 95% confidence interval trend-lines of f1 and f2, which makes a nice graph. I'd love to be able to calculate/plot the same confidence interval trend-lines for the z line.

The two ways I thought would work are:

  1. to get the lower CI for z I take the lower CI for f1 and the upper
    CI for f2 and plug them into the z equation. To get the upper CI I
    do the opposite. I doubt this is correct.
  2. I generate random y data for specific x data from the f1 model (I'm
    not sure how to do that) and then generate random y data for the
    same x data from the f2 model. This way I have y's for d1 and d2
    that share the same x data. I then can plug those discrete y's into
    the z equation and get my discrete z points. I then try to fit some
    model to that, though these discrete z points might not even follow a linear trend. this might be difficult or just silly.

So, this is my problem… Any brilliant ideas?

Best Answer

(All credit goes to Robert Shriver, I'm just posting this for him!) This is not an analytic solution, but I think that out of all the iterative solutions this ones' the nicest:

Take the mean regression parameters (intercept and slope) for both regressions (f1 and f2) along with the variance-covariance matrix of the intercept and slope parameters for each regression and plug them into a Multivariate normal distribution to simulate some new intercepts and slopes.

nn <- 100000
nx <- 100

bootf1 <- mvrnorm(nn, f1$coefficients, vcov(f1))
bootf2 <- mvrnorm(nn, f2$coefficients, vcov(f2))

This will be 10,000 random draws of the slope and intercept parameters for each regression. Use each one of these to determine the mean value of z across the range of x values you are interest in.

f1a <- bootf1[,2]
f1b <- bootf1[,1]
f2a <- bootf2[,2]
f2b <- bootf2[,1]

f1a <- rep(f1a, 1, nx)
f1b <- rep(f1b, 1, nx)
f2a <- rep(f2a, 1, nx)
f2b <- rep(f2b, 1, nx)

x  <- seq(-100, 1100, length=nx)
xs <- matrix(rep(x,nn), ncol=nx)

f1y <- f1b+f1a*xs
f2y <- f2b+f2a*xs

z = 1 / exp( 0.02254*f1y^2 / f2y^2 )

Then you can find the lower and upper CI of z as well as its mean by finding the .025 and .975 quantile of z at each x value.

setquantile <- function(x){
  r <- quantile(x, probs=c(0.025, 0.975))
  return(r)
}

ci <- apply(z, 2, setquantile)
mu <- apply(z, 2, mean)

plot(x, mu, type="l")
lines(x, ci[1,])
lines(x, ci[2,])

That's it. Please let me know if you find something better. Hope this helped someone!