As a start:
f <- function(x1,x2,a,b1,b2) {a * (b1^x1) * (b2^x2) }
# generate some data
x1 <- 1:10
x2 <- c(2,3,5,4,6,7,8,10,9,11)
set.seed(44)
y <- 2*exp(x1/4) + rnorm(10)*2
dat <- data.frame(x1,x2, y)
# fit a nonlinear model
fm <- nls(y ~ f(x1,x2,a,b1,b2), data = dat, start = c(a=1, b1=1,b2=1))
# get estimates of a, b
co <- coef(fm)
(All credit goes to Robert Shriver, I'm just posting this for him!)
This is not an analytic solution, but I think that out of all the iterative solutions this ones' the nicest:
Take the mean regression parameters (intercept and slope) for both regressions (f1 and f2) along with the variance-covariance matrix of the intercept and slope parameters for each regression and plug them into a Multivariate normal distribution to simulate some new intercepts and slopes.
nn <- 100000
nx <- 100
bootf1 <- mvrnorm(nn, f1$coefficients, vcov(f1))
bootf2 <- mvrnorm(nn, f2$coefficients, vcov(f2))
This will be 10,000 random draws of the slope and intercept parameters for each regression. Use each one of these to determine the mean value of z
across the range of x
values you are interest in.
f1a <- bootf1[,2]
f1b <- bootf1[,1]
f2a <- bootf2[,2]
f2b <- bootf2[,1]
f1a <- rep(f1a, 1, nx)
f1b <- rep(f1b, 1, nx)
f2a <- rep(f2a, 1, nx)
f2b <- rep(f2b, 1, nx)
x <- seq(-100, 1100, length=nx)
xs <- matrix(rep(x,nn), ncol=nx)
f1y <- f1b+f1a*xs
f2y <- f2b+f2a*xs
z = 1 / exp( 0.02254*f1y^2 / f2y^2 )
Then you can find the lower and upper CI of z
as well as its mean by finding the .025 and .975 quantile of z
at each x
value.
setquantile <- function(x){
r <- quantile(x, probs=c(0.025, 0.975))
return(r)
}
ci <- apply(z, 2, setquantile)
mu <- apply(z, 2, mean)
plot(x, mu, type="l")
lines(x, ci[1,])
lines(x, ci[2,])
That's it. Please let me know if you find something better. Hope this helped someone!
Best Answer
Pick any $(x_i)$ provided at least two of them differ. Set an intercept $\beta_0$ and slope $\beta_1$ and define
$$y_{0i} = \beta_0 + \beta_1 x_i.$$
This fit is perfect. Without changing the fit, you can modify $y_0$ to $y = y_0 + \varepsilon$ by adding any error vector $\varepsilon=(\varepsilon_i)$ to it provided it is orthogonal both to the vector $x = (x_i)$ and the constant vector $(1,1,\ldots, 1)$. An easy way to obtain such an error is to pick any vector $e$ and let $\varepsilon$ be the residuals upon regressing $e$ against $x$. In the code below, $e$ is generated as a set of independent random normal values with mean $0$ and common standard deviation.
Furthermore, you can even preselect the amount of scatter, perhaps by stipulating what $R^2$ should be. Letting $\tau^2 = \text{var}(y_i) = \beta_1^2 \text{var}(x_i)$, rescale those residuals to have a variance of
$$\sigma^2 = \tau^2\left(1/R^2 - 1\right).$$
This method is fully general: all possible examples (for a given set of $x_i$) can be created in this way.
Examples
Anscombe's Quartet
We can easily reproduce Anscombe's Quartet of four qualitatively distinct bivariate datasets having the same descriptive statistics (through second order).
The code is remarkably simple and flexible.
The output gives the second-order descriptive statistics for the $(x,y)$ data for each dataset. All four lines are identical. You can easily create more examples by altering
x
(the x-coordinates) ande
(the error patterns) at the outset.Simulations
This
R
function generates vectors $y$ according to the specifications of $\beta=(\beta_0,\beta_1)$ and $R^2$ (with $0 \le R^2 \le 1$), given a set of $x$ values.(It wouldn't be difficult to port this to Excel--but it's a little painful.)
As an example of its use, here are four simulations of $(x,y)$ data using a common set of $60$ $x$ values, $\beta=(1,-1/2)$ (i.e., intercept $1$ and slope $-1/2$), and $R^2 = 0.5$.
By executing
summary(fit)
you can check that the estimated coefficients are exactly as specified and the multiple $R^2$ is the intended value. Other statistics, such as the regression p-value, can be adjusted by modifying the values of the $x_i$.