Both of them return the same results.
Set $A=X^T WX$.
When computing the fitted values $\hat{\beta}_{WLS}$, the first half of the expression is the inverse of $A$. However, as in your code above, literally putting an exponent of $(-1)$ will simply perform $\frac{1}{A_{ij}}$ to every element $A_{ij}$ of the matrix.
Instead, to find the inverse of the matrix in R, simply type solve(A).
With these changes the results of the Weighted Regression manually and in R will match.
Hope this helps.
As you suspect, the second version is indeed a special case of the more general first result. We obtain it when $X_2=x_2$ and $X_1=(\iota\;\;x_1)$ with $\iota$ a vector of ones for the constant.
What (maybe) confuses you is that Wooldridge's statement only focuses on the coefficient on $x_1$ and does not bother to discuss $\tilde{b}_0$, the coefficient on the constant, as it is often of secondary interest.
When we have a constant, $x_1$ and $x_2$, we get a $(2\times1)$ vector in the short regression $\tilde{b}=(\tilde{b}_0,\tilde{b}_1)'$. Likewise, the regression of $x_2$ on an intercept and $x_1$ then yields a coefficient vector, call it $\Delta$, that contains $\Delta=(\delta_0,\delta)'$.
In Goldberger's general result, $\Delta$ corresponds to $(X_1'X_1)^{-1} X_1'X_2$, the OLSEs of a regression of $X_2$ on $X_1$. (When $X_2$ contains $k_2>1$ variables, we would actually obtain a $(k_1\times k_2)$ matrix of estimated coefficients here, with $k_1$ the number of variables in $X_1$.)
Finally, let $\hat{b}_{[0,1]}=(\hat{b}_0,\hat{b}_1)'$.
So all in all, we may write
$$
\tilde{b}=\hat{b}_{[0,1]}+\Delta\cdot\hat{b}_2,
$$
which is now, I hope, clearly a special case of Goldberger's formulation. Wooldridge just picks the second element of that vector.
Best Answer
You can obtain the solution for the intercept by setting the partial derivative of the squared loss with respect to the intercept $\beta_0$ to zero. Let $\beta_0 \in \mathbb{R}$ denote the intercept, $\beta \in \mathbb{R}^d$ the coefficients of features, and $x_i \in \mathbb{R}^d$ the feature vector of the $i$-th sample. We have to solve
\begin{align} \arg\min_{\beta_0} \quad& \mathcal{L}(\beta_0, \beta) \\ \mathcal{L}(\beta_0, \beta) &= \frac{1}{2} \sum_{i=1}^n (y_i - \beta_0 - x_i^\top \beta)^2 \\ \frac{\partial}{\partial \beta_0} \mathcal{L}(\beta_0, \beta) &= -\sum_{i=1}^n (y_i - \beta_0 - x_i^\top \beta) = 0 \end{align}
All we have to do is to solve for $\beta_0$:
\begin{align} \sum_{i=1}^n \beta_0 &= \sum_{i=1}^n (y_i - x_i^\top \beta) \\ \beta_0 &= \frac{1}{n} \sum_{i=1}^n (y_i - x_i^\top \beta) \end{align}
Usually, we assume that all features are centered, i.e., $$\frac{1}{n} \sum_{i=1}^n x_{ij} = 0 \qquad \forall j \in \{1,\ldots,d\},$$ which simplifies the solution for $\beta_0$ to be the average response: \begin{align} \beta_0 &= \frac{1}{n} \sum_{i=1}^n y_i - \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^d x_{ij} \beta_j \\ &= \frac{1}{n} \sum_{i=1}^n y_i - \sum_{j=1}^d \beta_j \frac{1}{n} \sum_{i=1}^n x_{ij} \\ &= \frac{1}{n} \sum_{i=1}^n y_i \end{align}
If in addition, we also assume that the response $y$ is centered, i.e., $\frac{1}{n} \sum_{i=1}^n y_i = 0$, the intercept is zero and thus eliminated.