I do not have the time to give a detailed answer, but since no one has helped you so far, I will give some hints.
Is this correct? and Is s as I have defined it?
Your formulas seem to be fine, they just lack the summations, as noted by @Gilles.
So why is standard error the standard deviation divided by square root N?
Standard error is useful for significance testing. It is used in $t$-tests. Without the division by $\sqrt{N}$ the $t$-tests would not work. You should be able to find more information in econometrics textbooks.
I am guessing that it is using beta-hat as a substitute for all parameters in the model?
That could very well be the case. $\hat{\beta}$ would be a parameter vector rather than a single parameter. Its variance would be a matrix rather than a single number.
"variance of the unknown errors" <--- don't know what that means
In a linear regression model of the form $y=\beta_0+\beta_1 x_1+\dotsb+\beta_K x_K+\varepsilon$, variance of the unknown errors would be the variance of $\varepsilon$, denoted $\sigma^2$; it is a single number (not a matrix). It can be estimated as the sample variance of the model residuals, denoted $\hat{\sigma}^2$.
However, the variance of the linear regression parameter vector $\hat{\beta}$ is $\text{Var}(\hat{\beta}|X)=\sigma^2 (X^T X)^{-1}$, which is not the variance of the unknown errors multiplied by the indentity matrix.
A note on what you did not ask: if you want to really understant what is going on in the OLS estimation applied on a linear regression model, you should take your time and study the subject carefully and with patience. Just getting your questions answered like here may not be useful in the long run.
After scouring the web for more information as to how a system of nonlinear equations can efficiently be solved in R, I stumbled upon the nlsur
method available in Stata. I know not everyone is able to use Stata, but for those that have the opportunity, it's helpful to know that the implementation in Stata is very easy. I have reused all my datapreparation work and exported only the transformed variables as a .csv
This is the code which solves the following equation system:
#delimit ; /*sets line end indicator to semicolon*/
nlsur (les = ({sigma}-1)/{sigma} * (ln({psi}) + {alphae}*t)+
(1- {sigma})/{sigma} * ly + (1+{sigma})*ln(ebar) + ln(1-gv))
(lks = ({sigma}-1)/{sigma} * (ln({psi}) + {alphal}*beta*t + beta * ln(lbar) + (1-beta) * ln(kbar)) +
(1- {sigma})/{sigma} * ly + ln(gv* (1-beta)))
(ly = ln({psi}) + {sigma}/({sigma} - 1) * ln(gv*(exp({alphal} * beta*t)*(lbar)^beta*
(kbar)^(1- beta))^(({sigma}-1)/{sigma}) + (1-gv)* (exp({alphae}*t)*ebar) ^(({sigma}-1)/{sigma}))),
initial(sigma 0.6 psi 1 alphae 0.05 alphal 0.05);
#delimit cr /*sets line end indicator back to carriage return*/
This uses a 2stage FGLS to solve the following three equations(if an iterated FGLS is wanted the flag ifgnls
can be set after the initials):
$$
\log S_E =\frac{\sigma -1}{\sigma}\left(\log{\psi} + \alpha_E \tilde{t}\right) + \frac{1-\sigma}{\sigma}\log{\tilde{Y}}+ (\sigma + 1)\log{\tilde{E}} + \log{\gamma_E}
$$
$$
\log{S_L} = \frac{\sigma -1}{\sigma}\left(\log{\psi} + \alpha_L \beta \tilde{t}+\beta
\log{\tilde{L}} + (1-\beta)\log{\tilde{K}}\right) + \frac{1-\sigma}{\sigma}\log{{\tilde{Y}}} + \log{\gamma_V \beta}
$$
$$
\log\tilde{Y} = \log\psi + \frac{\sigma}{\sigma - 1} \log ( \gamma_{V}\left(e^{\alpha_{L}\tilde{t}\beta}\left(\tilde{L}\right)^{\beta} \left(\tilde{K}\right)^{1-\beta}\right)^{\frac{\sigma-1}{\sigma}}
+\gamma_{E}\left(e^{\alpha_{E}\tilde{t}} \tilde{E}\right)^{\frac{\sigma-1}{\sigma}})
$$
The cross equation restrictions on my parameters are automatically recognized and the convergence behavior seems good. You have to use curly braces to explicitly declare the parameters. I personally am not a good enough programmer, but I hope that a package analogous to nlsur
in Stata will become available in R.
Edit:
One big draw-back is that this method does NOT support Newey-West standard errors which is a big issue for my estimation. When writing Stata support they have answered that development for this is currently not planned. This is a shame because hac-robust estimation is available for the normal nonlinear estimation, guess I will have to find something more suitable after all.
Best Answer
There is no change in the interpretation of the parameters since the parameters being estimated are algebraically identical between the linear regression model with heteroskedasticity and the transformed model, OLS on which gives the WLS estimator.
Let us take this at a leisurely pace.
Linear regression model
The linear regression model (potentially with heteroskedasticity) is the following
$$ \begin{align} Y_i &= \beta_0 + \beta_1 X_{1i} + \dots + \beta_K X_{Ki} + \varepsilon_i \\ \mathbb{E}(\varepsilon_i \mid X_{1i}, \ldots, X_{Ki}) &= 0 \end{align} $$
This is equivalent to the model for the conditional mean for $Y_i$, $$ \mathbb{E}(Y_i \mid X_{1i}, \ldots, X_{Ki}) = \beta_0 + \sum_{k=1}^K \beta_k X_{ki} $$
Interpretation of parameters
From here, we can get at the standard interpretation of the parameters of the linear regression model as marginal effects, that is $$ \dfrac{\partial \mathbb{E}(Y_i \mid X_{1i}, \ldots, X_{Ki})}{\partial X_{ki}} = \beta_k $$ This states, that the regression coefficient of a regressor is the effect of a unit change in that regressor on the conditional mean of the outcome variable. Note that this interpretation is made independent of the heteroskedasticity assumption in the model. This is the interpretation that the estimated parameters retain, for OLS and WLS.
Transformed linear regression model
Now consider that we transform the original regression model, under the assumption that the error heteroskedasticity has the following form $$ \mathbb{E}(\varepsilon_i^2\mid X_{1i}, \ldots, X_{Ki}) = \sigma^2 X_{ki}. $$
The transformed model is $$ \frac{Y_i}{\sqrt{X_{ki}}} = \beta_0 \frac{1}{\sqrt{X_{ki}}} + \beta_1\frac{X_{1i}}{\sqrt{X_{ki}}} + \ldots+\beta_k \frac{X_{ki}}{\sqrt{X_{ki}}} +\ldots + \beta_K \frac{X_{Ki}}{\sqrt{X_{Ki}}} + \underbrace{\frac{\varepsilon_i}{\sqrt{X_{Ki}}}}_{\equiv \nu_i} $$
Aside: A more usual simple model for heteroskedasticity is $$ \mathbb{E}(\varepsilon_i^2\mid X_{1i}, \ldots, X_{Ki}) = \sigma^2 X_k^2 $$ in order to preserve the positiveness of the second moment.
Note that the model is now a classical linear regression model, since $$ \begin{align} \mathbb{E}(\nu_i\mid X_{1i}, \ldots, X_{Ki}) &= 0 \\ \mathbb{E}(\nu_i^2\mid X_{1i}, \ldots, X_{Ki}) &= \sigma^2 \end{align} $$ Therefore, OLS estimates of the parameters from this transformed model (that is, the WLS estimator) are BLUE, which is the whole point of the exercise. Note that a constant should not be included in the estimation of this model. Also note that I have used the original conditioning regressors as conditioning variables, rather than the transformed regressors, since it is easy to see that the same functions are measurable with respect to the two conditioning sets.
Interpretation of parameters
The transformed model is equivalent to $$ \mathbb{E}\left(\frac{Y_i}{\sqrt{X_{ki}}}\mid X_{1i}, \ldots, X_{Ki}\right) = \beta_0 \frac{1}{\sqrt{X_{ki}}} + \sum_{l=1}^K\beta_l\frac{X_{li}}{\sqrt{X_{ki}}} $$
This is now the crucial part -- consider the expressions for the marginal effect on the transformed outcome w.r.t. to one of the original regressors, and w.r.t. the transformed regressors.
The same as before! Here I have used the fact that $$ \mathbb{E}\left(\frac{Y_i}{\sqrt{X_{ki}}}\mid X_{1i}, \ldots, X_{Ki}\right) = \frac{1}{\sqrt{X_{ki}}}\mathbb{E}(Y_i \mid X_{1i}, \ldots, X_{Ki}) $$ since conditioning variables are treated as constant by the expectations operator.
Why this makes sense
Note that you formulate the model the way you do (in terms of the original outcomes and regressors) because you are interested in the parameters of that model (the original $\beta_k$s). Features such as heteroskedasticty reduce the efficiency of the OLS estimated parameters and you might want to correct for that using WLS, and (F)GLS. But it would be slightly counterproductive if this changed the interpretation of the model parameters that you are interested in. The key is in the way you say it -- OLS and WLS estimates of the model parameters, implying one set of population parameters being estimated by both estimators. This can be formalised by saying that the OLS and WLS parameters are consistent for the same population parameters, however, they differ in their asymptotic efficiency.
What most applied economists do
Most applied economists would rather their parameters were close to the truth with high probability as the sample size grows, i.e., that is their parameter estimates were consistent. A crucial aspect of WLS and FGLS is that they require the specification of an auxiliary model for the heteroskedasticity, in order to get at the extra efficiency afforded by those estimators. However, the price of getting this auxiliary model wrong is that the property of consistency is lost. Most applied economists prefer to simply use White robust standard errors to correct the estimates of the standard errors of OLS estimates, and live with the lower efficiency of their estimators.