As a start:
f <- function(x1,x2,a,b1,b2) {a * (b1^x1) * (b2^x2) }
# generate some data
x1 <- 1:10
x2 <- c(2,3,5,4,6,7,8,10,9,11)
set.seed(44)
y <- 2*exp(x1/4) + rnorm(10)*2
dat <- data.frame(x1,x2, y)
# fit a nonlinear model
fm <- nls(y ~ f(x1,x2,a,b1,b2), data = dat, start = c(a=1, b1=1,b2=1))
# get estimates of a, b
co <- coef(fm)
Correlation and Dependent Parameter Estimates
I don't have the Stock and Watson book, but orthogonality is what you need to make sure your estimates of $\beta_i$ are independent.
Christensen (2002) derives the sampling distribution for $\hat{\boldsymbol{\beta}}$ in linear regression when all of the $\beta_i$'s are estimable as $$
\hat{\boldsymbol{\beta}} \sim \mathrm{N}\bigl( \boldsymbol{\beta}, \sigma^2 (\mathbf{X}'\mathbf{X})^{-1}\bigr)~.
$$
Using the examples from the paper as our model matrix (so no intercept, and we're fitting $\beta_1$ and $\beta_2$) the uncorrelated but non-orthogonal matrix $$
\mathbf{X} = \left[\begin{array}{rr}
0 & 1 \\
0 & 0 \\
1 & 1 \\
1 & 0
\end{array}
\right]
$$
gives us a covaraince matrix for $\hat{\boldsymbol{\beta}}$ proportional to $$
(\mathbf{X}'\mathbf{X})^{-1}
=
\left[\begin{array}{rr}
2 & -1 \\
-1 & 2
\end{array}\right]~
$$
where the non-zero off-diagonal entries indicate that estimates of $\beta_1$ will be different when we're also estimating $\beta_2$. I generated some gaussian noise in R and let $\beta_1 = \beta_2 = 1$, then I arrived at the following:
> cor(X)
[,1] [,2]
[1,] 1 0
[2,] 0 1
> y
[,1]
[1,] 0.4537310
[2,] -0.2468915
[3,] 1.0462463
[4,] 1.0854357
> coef(lm(y~X-1))
X11 X12
0.9211289 0.2894242
> coef(lm(y~X[,1]-1))
X1[, 1]
1.065841
Note that $\hat{\beta}_1 = 0.92$ under the full model but $\hat{\beta}_1 = 1.07$ under the restricted model.
Orthogonality in Designed Experiments
The nice thing about orthogonality is that it ensures $(\mathbf{X}'\mathbf{X})^{-1}$ is diagonal so there is no covariance between parameter estimates and our estimate of $\beta_1$ will not change depending on if $\beta_2$ is in the model or not.
If we are running a non-orthogonal experiment to determine which predictors have an effect on the response (I'd call this a screening experiment) then we don't know which $\beta_i$'s belong in the model and which do not. We'd have to recalculate a whole mess of things as we add and remove terms. In the orthogonal case we only have to make a calculation once, and there's no difference between our estimate of $\beta_i$ after $\beta_j$ was added and $\beta_i$ before $\beta_j$ was added. The interpretation is more clean, and back in the day the amount of hand calculations required for analysis added to the popularity of orthogonal designs.
But that's not to say orthogonality is required. In screening experiments there is often a sparsity of active effects - that is, most of the $\beta_i$'s are zero or negligible. Non-orthogonal designs like no-confounding designs (Jones and Montgomery 2010) and definitive screening designs have small parameter estimate covariances and allow us to investigate more factors or more complex models in (sometimes drastically) fewer runs than an equivalent orthogonal design. Step/stage-wise procedures can be used for the analysis.
Citations
Christensen, R. (2002). Plane Answers to Complex Questions. Springer.
Jones, B. and Montgomery, M. (2010). Alternatives to Resolution IV Screening Designs in 16 Runs. Int. Journal of Experimental Design and Process Optimisation. Vol 1 #4. 285-295.
Best Answer
By how they are constructed, the residuals are orthogonal to the regressors, not only in the statistical sense but also as numerical vectors, see this answer. We are writing the matrices so that they conform, namely $X_2'M_2Y =0$ since $M_2 = I-X_2(X_2'X_2)^{-1}X_2'$
The reason why one finds phrases that appear to equate "orthogonality" with "uncorrelatedness" in econometrics writings, is because usually these are discussed with respect to residuals, or to the error terms. The first have by construction zero mean (as long as the regression includes a constant), the second are assumed to have zero mean. But then, the covariance of these entities with any variable is
$$\operatorname{Cov}(X,u) = E(Xu) - E(X)E(u) = E(Xu) $$
since $E(u)$ is (or is assumed) equal to zero. In such a case, orthogonality becomes equivalent to uncorrelatedness. Otherwise, with both variables having non-zero mean, they are not equivalent.
But this means, that if we examine variables centered on their mean (and so having by construction zero mean), then orthogonality becomes equivalent to non-correlation. Since the practice of thus centering the variables is widely used for various reasons, (outside econometrics also), then again, orthogonality becomes equivalent to non-correlation.
On the contrary, with non-zero means, we have the opposite relation: orthogonality implies correlation.
Assume the variables are orthogonal, $E(XY) =0$. then
$$\operatorname{Cov}(X,Y) = E(XY) - E(X)E(Y) = - E(X)E(Y) \neq 0 $$
So they are correlated.
The above also tells us that we can have $E(XY)\neq 0$, $E(X)\neq 0, E(Y)\neq 0$ , but $\operatorname{Cov}(X,Y) = 0$, if $E(XY) = E(X)E(Y)$. In other words, non-zero-mean independent variables are uncorrelated but not orthogonal.
In all, one should carefully contemplate these concepts and understand under which conditions the one implies the other or the negation of the other.