The wikipedia definition is a fine definition that you can use for your paper if you need one but I think you're missing something.
The $\epsilon$ is random error, which is synonymous with noise. In practice, the random error can be Gaussian distributed, in which case it is Gaussian noise, but it could take on other distributions. If the distribution of $\epsilon$ happens to be Gaussian then you've met one of the theoretical assumptions of the model and things like interval estimation are better justified. If it's not Gaussian then, like Glen_b said, you still have that it's best linear unbiased.
Theoretically, the random error (noise) is supposed to be Gaussian distributed but the outcome could be anything. So, in order to answer your question you'd need to state whether you want to know the distribution of your particular noise or what the distribution of the noise should be. For the former you'd need data.
I will give you an aswer considering a regression based of $Y$ on $X_1$ and $X_2$ only, since it is clearer and it simplifies the algebra. It does not change anything in terms of intuition or results.
Note that I am assuming that the intercept is part of $X_1$, and that you regress it only once (which is all that matters, you could also assume that the vector of $1$ is part of $X_2$.
First we regress $Y = X_2 \beta_2 + \epsilon$ (so, without $X_1$)
The coefficient $\hat{\beta}_2 = (X_2'X_2)^{-1}X_2'Y$
Therefore the residual $e= Y -X_2(X_2'X_2)^{-1}X_2'Y = [\mathbf{I} -X_2(X_2'X_2)^{-1}X_2']Y = M_2 Y$, where $M_2$ is the orthogonal component to $X_2$
Second, we regress the auxiliary $X_1 = X_2 \gamma_2 + \epsilon$
Following the same steps you will find that $e_{aux} = M_2X_1$
As a final step, we regress $e = e_{aux}\delta$ and we want to show that $\hat{\delta} = \hat{\alpha}_2$, where $\alpha_2$ is the coefficient from the "full" regression $Y = X_1\alpha_1 + X_2\alpha_2 + \epsilon$
Substituting from the first two parts:
$M_2Y = M_2X_1\delta +\epsilon$
Therefore $\hat{\delta} = (X_1'M_2'M_2X_1)^{-1}X_1'M_2'M_2Y = (X_1'M_2X_1)^{-1}X_1'M_2Y$ since $M_2$ is idempotent.
Finally, what is the FWL theorem telling you? That running the whole regression or doing this steps gives the same coefficient, since we are removing from each variable the part already explained by the others.
How to see that it is actually the same?
Set the normal equations for the full regression, compute $\hat{\alpha_1}$ and $\hat{\alpha_2}$, plug one of the two in the expression of the other. You will eventually obtain the same matrix result I show for $\delta$.
Best Answer
Park's original one-page paper (here) was more concerned with dealing with heteroskedasticity, rather than test for its existence. So given heteroskedasticity, Park assumes a specific form of it, namely a log-linear relationship between the variance of the error term and one regressor
$$\sigma^2_{\epsilon _i} = \sigma^2X_i^{\gamma}e^{u_i}$$ $$\Rightarrow \ln \sigma^2_{\epsilon _i} = \ln\sigma^2 + \gamma \ln X_i + u_i$$
To estimate this relationship, one needs to obtain a data series for $\ln \sigma^2_{\epsilon _i}$. Park suggested using the residuals from the original regression as a substitute, i.e.
$$\ln \sigma^2_{\epsilon _i} \approx \ln (\hat \epsilon^2_{1i})$$
assume $u_i$ is "nicely behaved" and estimate the regression
$$\ln (\hat \epsilon^2_{1i})= a + \gamma \ln X_i + u_i$$
Then, in order to deal with heteroskedasticity, one would transform the original equation by dividing by $X^{\hat \gamma/2}$
"Park's test" is to view instead the auxiliary regression as a test for heteroskedasticity, where if $\hat \gamma$ appears statistically significant, the null hypothesis of no-heteroskedasticity is rejected. In any case, I don't see where the second regression you mention in the question comes into play.