Solved – Interpret regression coefficients after WLS

econometricsmultiple regressionweighted-regression

I have a couple question that have arisen studying Weighted Least Squares from Woolridge, all centering around corrections for heteroskedisticity in cross sectional data.

Woolridge first shows an example of weighted least squares assuming the variance function is known. He divides the equation

$$sav_{i}=\beta_{0}+\beta_{1}inc_{i}+u_{i}$$

by $\sqrt{inc_{i}}$ assuming that $\text{var}(u|x) = \sigma^2inc_{i}$

to arrive at the equation

$$\frac{sav_{i}}{\sqrt{inc_{i}}}=\beta_{0}(\frac{1}{\sqrt{inc_{i}}})+\beta_{1}\sqrt{inc_{i}}+u_{i}$$

Question: Do you now have to interpret $\beta_{1}$ in terms of the square root of income on the ratio of savings to the square root of income?
When using FGLS (not knowing the variance function) to correct for heteroskadasticity, is the interpretation of the betas the same as in the ordinary regression (just now they are not unbiased) – i.e. with $sav_{i}=\beta_{0}+\beta_{1}inc_{1}+\beta_{2}X_{2}+u_{i}$ that for each unit increase in income, savings increases by $\beta_{1}$, holding $X_{2}$ (whatever it is, but was used in the weighting – see FGLS algorithm below) constant?
Are most econometricians these days using OLS for the unbiased point estimates and making the correction for heteroeskadacity by using robust standard errors?

NOTES:
In both #1 and #2, you are using weighted least squares and getting there by dividing both sides of the equation by $\sqrt{weight_{i}}$.

In case #1, you assume the variance is known and is proportionate to income (i.e. inc).

In case #2, you are estimating the variance by the following process:

enter image description here

Best Answer

There is no change in the interpretation of the parameters since the parameters being estimated are algebraically identical between the linear regression model with heteroskedasticity and the transformed model, OLS on which gives the WLS estimator.

Let us take this at a leisurely pace.

Linear regression model

The linear regression model (potentially with heteroskedasticity) is the following

$$ \begin{align} Y_i &= \beta_0 + \beta_1 X_{1i} + \dots + \beta_K X_{Ki} + \varepsilon_i \\ \mathbb{E}(\varepsilon_i \mid X_{1i}, \ldots, X_{Ki}) &= 0 \end{align} $$

This is equivalent to the model for the conditional mean for $Y_i$, $$ \mathbb{E}(Y_i \mid X_{1i}, \ldots, X_{Ki}) = \beta_0 + \sum_{k=1}^K \beta_k X_{ki} $$

Interpretation of parameters

From here, we can get at the standard interpretation of the parameters of the linear regression model as marginal effects, that is $$ \dfrac{\partial \mathbb{E}(Y_i \mid X_{1i}, \ldots, X_{Ki})}{\partial X_{ki}} = \beta_k $$ This states, that the regression coefficient of a regressor is the effect of a unit change in that regressor on the conditional mean of the outcome variable. Note that this interpretation is made independent of the heteroskedasticity assumption in the model. This is the interpretation that the estimated parameters retain, for OLS and WLS.

Transformed linear regression model

Now consider that we transform the original regression model, under the assumption that the error heteroskedasticity has the following form $$ \mathbb{E}(\varepsilon_i^2\mid X_{1i}, \ldots, X_{Ki}) = \sigma^2 X_{ki}. $$

The transformed model is $$ \frac{Y_i}{\sqrt{X_{ki}}} = \beta_0 \frac{1}{\sqrt{X_{ki}}} + \beta_1\frac{X_{1i}}{\sqrt{X_{ki}}} + \ldots+\beta_k \frac{X_{ki}}{\sqrt{X_{ki}}} +\ldots + \beta_K \frac{X_{Ki}}{\sqrt{X_{Ki}}} + \underbrace{\frac{\varepsilon_i}{\sqrt{X_{Ki}}}}_{\equiv \nu_i} $$

Aside: A more usual simple model for heteroskedasticity is $$ \mathbb{E}(\varepsilon_i^2\mid X_{1i}, \ldots, X_{Ki}) = \sigma^2 X_k^2 $$ in order to preserve the positiveness of the second moment.

Note that the model is now a classical linear regression model, since $$ \begin{align} \mathbb{E}(\nu_i\mid X_{1i}, \ldots, X_{Ki}) &= 0 \\ \mathbb{E}(\nu_i^2\mid X_{1i}, \ldots, X_{Ki}) &= \sigma^2 \end{align} $$ Therefore, OLS estimates of the parameters from this transformed model (that is, the WLS estimator) are BLUE, which is the whole point of the exercise. Note that a constant should not be included in the estimation of this model. Also note that I have used the original conditioning regressors as conditioning variables, rather than the transformed regressors, since it is easy to see that the same functions are measurable with respect to the two conditioning sets.

Interpretation of parameters

The transformed model is equivalent to $$ \mathbb{E}\left(\frac{Y_i}{\sqrt{X_{ki}}}\mid X_{1i}, \ldots, X_{Ki}\right) = \beta_0 \frac{1}{\sqrt{X_{ki}}} + \sum_{l=1}^K\beta_l\frac{X_{li}}{\sqrt{X_{ki}}} $$

This is now the crucial part -- consider the expressions for the marginal effect on the transformed outcome w.r.t. to one of the original regressors, and w.r.t. the transformed regressors.

$$ \begin{align} \frac{\partial \mathbb{E}\left(\frac{Y_i}{\sqrt{X_{ki}}}\mid X_{1i}, \ldots, X_{Ki}\right)}{\partial X_{li}} &= \beta_l \frac{\partial X_{li}/\sqrt{X_{ki}}}{\partial X_{li}} \\ &= \beta_l \end{align} $$

The same as before! Here I have used the fact that $$ \mathbb{E}\left(\frac{Y_i}{\sqrt{X_{ki}}}\mid X_{1i}, \ldots, X_{Ki}\right) = \frac{1}{\sqrt{X_{ki}}}\mathbb{E}(Y_i \mid X_{1i}, \ldots, X_{Ki}) $$ since conditioning variables are treated as constant by the expectations operator.

On the other hand, if I find the marginal effect with respect to the transformed regressor, I get $$ \frac{\partial\mathbb{E}\left(\frac{Y_i}{\sqrt{X_{ki}}}\mid X_{1i}, \ldots, X_{Ki}\right)}{\partial \frac{X_{li}}{\sqrt{X_{ki}}}} = \beta_l \sqrt{X_{ki}} $$ which is clearly not the same as the parameter being estimated. To elaborate, this is the interpretation you are asking about -- "are the $\beta$s estimated by WLS the effect of a unit change in the rescaled regressors?" The answer, as demonstrated here, is no.

Why this makes sense

Note that you formulate the model the way you do (in terms of the original outcomes and regressors) because you are interested in the parameters of that model (the original $\beta_k$s). Features such as heteroskedasticty reduce the efficiency of the OLS estimated parameters and you might want to correct for that using WLS, and (F)GLS. But it would be slightly counterproductive if this changed the interpretation of the model parameters that you are interested in. The key is in the way you say it -- OLS and WLS estimates of the model parameters, implying one set of population parameters being estimated by both estimators. This can be formalised by saying that the OLS and WLS parameters are consistent for the same population parameters, however, they differ in their asymptotic efficiency.

What most applied economists do

Most applied economists would rather their parameters were close to the truth with high probability as the sample size grows, i.e., that is their parameter estimates were consistent. A crucial aspect of WLS and FGLS is that they require the specification of an auxiliary model for the heteroskedasticity, in order to get at the extra efficiency afforded by those estimators. However, the price of getting this auxiliary model wrong is that the property of consistency is lost. Most applied economists prefer to simply use White robust standard errors to correct the estimates of the standard errors of OLS estimates, and live with the lower efficiency of their estimators.

Related Solutions

Solved – Variance vs Standard Deviation vs SE vs Var(beta hat)

I do not have the time to give a detailed answer, but since no one has helped you so far, I will give some hints.

Is this correct? and Is s as I have defined it?

Your formulas seem to be fine, they just lack the summations, as noted by @Gilles.

So why is standard error the standard deviation divided by square root N?

Standard error is useful for significance testing. It is used in $t$-tests. Without the division by $\sqrt{N}$ the $t$-tests would not work. You should be able to find more information in econometrics textbooks.

I am guessing that it is using beta-hat as a substitute for all parameters in the model?

That could very well be the case. $\hat{\beta}$ would be a parameter vector rather than a single parameter. Its variance would be a matrix rather than a single number.

"variance of the unknown errors" <--- don't know what that means

In a linear regression model of the form $y=\beta_0+\beta_1 x_1+\dotsb+\beta_K x_K+\varepsilon$, variance of the unknown errors would be the variance of $\varepsilon$, denoted $\sigma^2$; it is a single number (not a matrix). It can be estimated as the sample variance of the model residuals, denoted $\hat{\sigma}^2$.

However, the variance of the linear regression parameter vector $\hat{\beta}$ is $\text{Var}(\hat{\beta}|X)=\sigma^2 (X^T X)^{-1}$, which is not the variance of the unknown errors multiplied by the indentity matrix.

A note on what you did not ask: if you want to really understant what is going on in the OLS estimation applied on a linear regression model, you should take your time and study the subject carefully and with patience. Just getting your questions answered like here may not be useful in the long run.

Solved – How to estimate a nonlinear equation system in R

After scouring the web for more information as to how a system of nonlinear equations can efficiently be solved in R, I stumbled upon the nlsur method available in Stata. I know not everyone is able to use Stata, but for those that have the opportunity, it's helpful to know that the implementation in Stata is very easy. I have reused all my datapreparation work and exported only the transformed variables as a .csv

This is the code which solves the following equation system:

#delimit ; /*sets line end indicator to semicolon*/
nlsur (les = ({sigma}-1)/{sigma} * (ln({psi}) + {alphae}*t)+ 
    (1- {sigma})/{sigma} * ly + (1+{sigma})*ln(ebar) + ln(1-gv))
    (lks = ({sigma}-1)/{sigma} * (ln({psi}) + {alphal}*beta*t + beta *  ln(lbar) + (1-beta) * ln(kbar)) + 
    (1- {sigma})/{sigma} * ly +  ln(gv* (1-beta)))
    (ly = ln({psi}) + {sigma}/({sigma} - 1) * ln(gv*(exp({alphal} * beta*t)*(lbar)^beta* 
    (kbar)^(1- beta))^(({sigma}-1)/{sigma}) + (1-gv)* (exp({alphae}*t)*ebar) ^(({sigma}-1)/{sigma}))), 
    initial(sigma 0.6 psi 1 alphae 0.05 alphal 0.05);
#delimit cr /*sets line end indicator back to carriage return*/

This uses a 2stage FGLS to solve the following three equations(if an iterated FGLS is wanted the flag ifgnls can be set after the initials): $$ \log S_E =\frac{\sigma -1}{\sigma}\left(\log{\psi} + \alpha_E \tilde{t}\right) + \frac{1-\sigma}{\sigma}\log{\tilde{Y}}+ (\sigma + 1)\log{\tilde{E}} + \log{\gamma_E} $$ $$ \log{S_L} = \frac{\sigma -1}{\sigma}\left(\log{\psi} + \alpha_L \beta \tilde{t}+\beta \log{\tilde{L}} + (1-\beta)\log{\tilde{K}}\right) + \frac{1-\sigma}{\sigma}\log{{\tilde{Y}}} + \log{\gamma_V \beta} $$ $$ \log\tilde{Y} = \log\psi + \frac{\sigma}{\sigma - 1} \log ( \gamma_{V}\left(e^{\alpha_{L}\tilde{t}\beta}\left(\tilde{L}\right)^{\beta} \left(\tilde{K}\right)^{1-\beta}\right)^{\frac{\sigma-1}{\sigma}} +\gamma_{E}\left(e^{\alpha_{E}\tilde{t}} \tilde{E}\right)^{\frac{\sigma-1}{\sigma}}) $$

The cross equation restrictions on my parameters are automatically recognized and the convergence behavior seems good. You have to use curly braces to explicitly declare the parameters. I personally am not a good enough programmer, but I hope that a package analogous to nlsurin Stata will become available in R.

Edit: One big draw-back is that this method does NOT support Newey-West standard errors which is a big issue for my estimation. When writing Stata support they have answered that development for this is currently not planned. This is a shame because hac-robust estimation is available for the normal nonlinear estimation, guess I will have to find something more suitable after all.

Best Answer

Linear regression model

Interpretation of parameters

Transformed linear regression model

Interpretation of parameters

Why this makes sense

What most applied economists do

Related Solutions

Solved – Variance vs Standard Deviation vs SE vs Var(beta hat)

Solved – How to estimate a nonlinear equation system in R

Related Question