The derivation in matrix notation
Starting from $y= Xb +\epsilon $, which really is just the same as
$\begin{bmatrix}
y_{1} \\
y_{2} \\
\vdots \\
y_{N}
\end{bmatrix}
=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1K} \\
x_{21} & x_{22} & \cdots & x_{2K} \\
\vdots & \ddots & \ddots & \vdots \\
x_{N1} & x_{N2} & \cdots & x_{NK}
\end{bmatrix}
*
\begin{bmatrix}
b_{1} \\
b_{2} \\
\vdots \\
b_{K}
\end{bmatrix}
+
\begin{bmatrix}
\epsilon_{1} \\
\epsilon_{2} \\
\vdots \\
\epsilon_{N}
\end{bmatrix} $
it all comes down to minimzing $e'e$:
$\epsilon'\epsilon = \begin{bmatrix}
e_{1} & e_{2} & \cdots & e_{N} \\
\end{bmatrix}
\begin{bmatrix}
e_{1} \\
e_{2} \\
\vdots \\
e_{N}
\end{bmatrix} = \sum_{i=1}^{N}e_{i}^{2}
$
So minimizing $e'e'$ gives us:
$min_{b}$ $e'e = (y-Xb)'(y-Xb)$
$min_{b}$ $e'e = y'y - 2b'X'y + b'X'Xb$
$\frac{\partial(e'e)}{\partial b} = -2X'y + 2X'Xb \stackrel{!}{=} 0$
$X'Xb=X'y$
$b=(X'X)^{-1}X'y$
One last mathematical thing, the second order condition for a minimum requires that the matrix $X'X$ is positive definite. This requirement is fulfilled in case $X$ has full rank.
The more accurate derivation which goes trough all the steps in greater dept can be found under http://economictheoryblog.com/2015/02/19/ols_estimator/
It general these regressions are not the same, but since you have simulated independence it works out.
Consider the full regression
\begin{align*}
Y = \alpha + X_1\beta_1 + X_2\beta_2 + X_3\beta_3 +\epsilon
\end{align*}
Let me define a $n\times 3$ matrix $Z = [1,X_1,X_2]$, a column of ones then the values of $X_1$ and $X_3$. Let $M_z$ be the residual making projection matrix. In other words this matrix when applied to a variable is the same as regression that variable on a column of ones, $X_1$ and $X_2$.
\begin{equation}
\hat{\beta_3} = \frac{X_3^TM_zY}{X_3^TM_zX_3}
\end{equation}
The above formula follows from the application of the FWL theorem. You can derive the result equally from minimizing the sum of squared residuals, but the matrix notation and FWL theorem make things much cleaner and in this case give us insight into the question you asked.
Recall that in simple regression, the general formula for the OLS estimate is $\hat{\beta} = \frac{X^TY}{X^TX}$. You can show for yourself using this formula that the $\beta_3$ that we derived above is the same as the following simple regression:
\begin{align*}
M_zY = M_zX_3\beta_3 + \epsilon
\end{align*}
(Note: $(M_z)^TM_z = M_z$ by properties of orthogonal projection matrices. Second note, I wrote $\epsilon$ again because these residuals are numerically the same by FWL. In other words this is the exact same regression).
So to answer the question, it would be same as residualizes Y by regression in a constant, $X_1$ and $X_2$ and then regressing these residuals on the residuals of $X_3$ regressed on a constant, $X_1$ and $X_2$.
In you simulation you have all your $Xs$ as independent standard normal variables. In expectation when you regress $X_3$ on the others, the result is the same so the residualizing of $X_3$ doesn't matter. In the real data, my guess is that $X_3$ is not mean 0 and has some real relationship with the other two variables so it will not work.
Best Answer
Although I cannot do justice to the question here--that would require a small monograph--it may be helpful to recapitulate some key ideas.
The question
Let's begin by restating the question and using unambiguous terminology. The data consist of a list of ordered pairs $(t_i, y_i)$ . Known constants $\alpha_1$ and $\alpha_2$ determine values $x_{1,i} = \exp(\alpha_1 t_i)$ and $x_{2,i} = \exp(\alpha_2 t_i)$. We posit a model in which
$$y_i = \beta_1 x_{1,i} + \beta_2 x_{2,i} + \varepsilon_i$$
for constants $\beta_1$ and $\beta_2$ to be estimated, $\varepsilon_i$ are random, and--to a good approximation anyway--independent and having a common variance (whose estimation is also of interest).
Background: linear "matching"
Mosteller and Tukey refer to the variables $x_1$ = $(x_{1,1}, x_{1,2}, \ldots)$ and $x_2$ as "matchers." They will be used to "match" the values of $y = (y_1, y_2, \ldots)$ in a specific way, which I will illustrate. More generally, let $y$ and $x$ be any two vectors in the same Euclidean vector space, with $y$ playing the role of "target" and $x$ that of "matcher". We contemplate systematically varying a coefficient $\lambda$ in order to approximate $y$ by the multiple $\lambda x$. The best approximation is obtained when $\lambda x$ is as close to $y$ as possible. Equivalently, the squared length of $y - \lambda x$ is minimized.
One way to visualize this matching process is to make a scatterplot of $x$ and $y$ on which is drawn the graph of $x \to \lambda x$. The vertical distances between the scatterplot points and this graph are the components of the residual vector $y - \lambda x$; the sum of their squares is to be made as small as possible. Up to a constant of proportionality, these squares are the areas of circles centered at the points $(x_i, y_i)$ with radii equal to the residuals: we wish to minimize the sum of areas of all these circles.
Here is an example showing the optimal value of $\lambda$ in the middle panel:
The points in the scatterplot are blue; the graph of $x \to \lambda x$ is a red line. This illustration emphasizes that the red line is constrained to pass through the origin $(0,0)$: it is a very special case of line fitting.
Multiple regression can be obtained by sequential matching
Returning to the setting of the question, we have one target $y$ and two matchers $x_1$ and $x_2$. We seek numbers $b_1$ and $b_2$ for which $y$ is approximated as closely as possible by $b_1 x_1 + b_2 x_2$, again in the least-distance sense. Arbitrarily beginning with $x_1$, Mosteller & Tukey match the remaining variables $x_2$ and $y$ to $x_1$. Write the residuals for these matches as $x_{2\cdot 1}$ and $y_{\cdot 1}$, respectively: the $_{\cdot 1}$ indicates that $x_1$ has been "taken out of" the variable.
We can write
$$y = \lambda_1 x_1 + y_{\cdot 1}\text{ and }x_2 = \lambda_2 x_1 + x_{2\cdot 1}.$$
Having taken $x_1$ out of $x_2$ and $y$, we proceed to match the target residuals $y_{\cdot 1}$ to the matcher residuals $x_{2\cdot 1}$. The final residuals are $y_{\cdot 12}$. Algebraically, we have written
$$\eqalign{ y_{\cdot 1} &= \lambda_3 x_{2\cdot 1} + y_{\cdot 12}; \text{ whence} \\ y &= \lambda_1 x_1 + y_{\cdot 1} = \lambda_1 x_1 + \lambda_3 x_{2\cdot 1} + y_{\cdot 12} =\lambda_1 x_1 + \lambda_3 \left(x_2 - \lambda_2 x_1\right) + y_{\cdot 12} \\ &=\left(\lambda_1 - \lambda_3 \lambda_2\right)x_1 + \lambda_3 x_2 + y_{\cdot 12}. }$$
This shows that the $\lambda_3$ in the last step is the coefficient of $x_2$ in a matching of $x_1$ and $x_2$ to $y$.
We could just as well have proceeded by first taking $x_2$ out of $x_1$ and $y$, producing $x_{1\cdot 2}$ and $y_{\cdot 2}$, and then taking $x_{1\cdot 2}$ out of $y_{\cdot 2}$, yielding a different set of residuals $y_{\cdot 21}$. This time, the coefficient of $x_1$ found in the last step--let's call it $\mu_3$--is the coefficient of $x_1$ in a matching of $x_1$ and $x_2$ to $y$.
Finally, for comparison, we might run a multiple (ordinary least squares regression) of $y$ against $x_1$ and $x_2$. Let those residuals be $y_{\cdot lm}$. It turns out that the coefficients in this multiple regression are precisely the coefficients $\mu_3$ and $\lambda_3$ found previously and that all three sets of residuals, $y_{\cdot 12}$, $y_{\cdot 21}$, and $y_{\cdot lm}$, are identical.
Depicting the process
None of this is new: it's all in the text. I would like to offer a pictorial analysis, using a scatterplot matrix of everything we have obtained so far.
Because these data are simulated, we have the luxury of showing the underlying "true" values of $y$ on the last row and column: these are the values $\beta_1 x_1 + \beta_2 x_2$ without the error added in.
The scatterplots below the diagonal have been decorated with the graphs of the matchers, exactly as in the first figure. Graphs with zero slopes are drawn in red: these indicate situations where the matcher gives us nothing new; the residuals are the same as the target. Also, for reference, the origin (wherever it appears within a plot) is shown as an open red circle: recall that all possible matching lines have to pass through this point.
Much can be learned about regression through studying this plot. Some of the highlights are:
The matching of $x_2$ to $x_1$ (row 2, column 1) is poor. This is a good thing: it indicates that $x_1$ and $x_2$ are providing very different information; using both together will likely be a much better fit to $y$ than using either one alone.
Once a variable has been taken out of a target, it does no good to try to take that variable out again: the best matching line will be zero. See the scatterplots for $x_{2\cdot 1}$ versus $x_1$ or $y_{\cdot 1}$ versus $x_1$, for instance.
The values $x_1$, $x_2$, $x_{1\cdot 2}$, and $x_{2\cdot 1}$ have all been taken out of $y_{\cdot lm}$.
Multiple regression of $y$ against $x_1$ and $x_2$ can be achieved first by computing $y_{\cdot 1}$ and $x_{2\cdot 1}$. These scatterplots appear at (row, column) = $(8,1)$ and $(2,1)$, respectively. With these residuals in hand, we look at their scatterplot at $(4,3)$. These three one-variable regressions do the trick. As Mosteller & Tukey explain, the standard errors of the coefficients can be obtained almost as easily from these regressions, too--but that's not the topic of this question, so I will stop here.
Code
These data were (reproducibly) created in
R
with a simulation. The analyses, checks, and plots were also produced withR
. This is the code.