Solved – Feasible Generalized Least Square in R

generalized linear modelheteroscedasticitymultiple regressionr

I am studying the factors influencing the annual salary for employees at a undisclosed bank. The regression model that I have decided to employ is as follows:

\begin{equation}
Y_{k}=\beta_{1}+\beta_{2}E_{k}+\beta_{3}D_{gk}+\beta_{4}D_{mk}+\beta_{5}D_{2k}+\beta_{6}D_{3k}+\varepsilon_{k}
\end{equation}
where $Y_{k}$ is the logarithm of annual salary, $E$ is the number of years of education, $D_{g}$ is a gender dummy, $D_{m}$ is a minority dummy, and where

\begin{equation}
D_{2}=\begin{cases} 1 &\text{Custodial job} \\ 0 & \text{Otherwise} \end{cases}
\end{equation}
and
\begin{equation}
D_{3}=\begin{cases} 1 &\text{Management job} \\ 0 & \text{Otherwise} \end{cases}
\end{equation}
As you know, whenever one deals with GLS, $\Omega$ will almost surely be unknown and thus have to be estimated. In general there are $\frac{n(n+1)}{2}$ parameters to be estimated, which makes it pretty impossible to come up with a viable estimation out of $n$ observations. This is usually counteracted by imposing some structure on $\Omega$.

In my case, I would like to make the assumption that the disturbance terms $\varepsilon_{k}$ in the above regression model have variance $\sigma_{i}^{2}$ for $i=1,2,3$, according to whether the $i$-th employee has a job in category 1,2, or 3 respectively. Now, we may introduce the transformations $\gamma_{1}=\log (\sigma_{1}^{2}),\gamma_{2}=\log(\sigma_{2}^{2}/\sigma_{1}^{2})$, and $\gamma_{3}=\log(\sigma_{3}^{2}/\sigma_{1}^{2})$ so as to enable us to formulate the following model for

\begin{equation}
\sigma_{k}^{2}= \exp \{ \gamma_{1}+\gamma_{2}D_{2k}+\gamma_{3}D_{3k} \}
\end{equation}
Since $\hat{\beta}_\rm{OLS}$ is a consistent estimate of $\beta$, even under the assumption of heteroscedasticity, we have that $\hat{\beta}_{\rm OLS} \xrightarrow[]{p}\beta$ as the number of observations increase. We may therefore argue that $e_{k}^{2} \approx \sigma_{k}^{2}$, and so we can regress upon information that we already possess.

Summary of procedure

(1) Calculate the OLS estimate.

(2) Calculate the OLS residual $\textbf{e}=\textbf{Y}-\textbf{X}\hat{\beta}$

(3) Calculate the OLS estimate of $\gamma$ from $e_{k}^{2}=f_{\gamma}(Z_{k})+\overline{\varepsilon}_{k}$.

(4) Calculate the FGLS estimate as the GLS estimate with $\hat{\Omega}=\Omega(\hat{\gamma})$ in place of $\Omega$.

What I would like to know is whether or not one can perform this estimation using a known function in R, say gls? If the answer is yes, then how exactly should I write to ensure that that my heteroscedasticity assumption is taken into account? Thanks for taking the time! Have a great day!

Best Answer

Estimating Regression Models with Multiplicative Heteroscedasticity

The model that you have described is discussed in Harvey (1976).

Let me rewrite the model $$ \begin{align} \mathbb{E}(Y_i \mid \mathbf{X}_i, \mathbf{Z}_i) &= \mathbf{X}_i'\boldsymbol{\beta} \\ \mathbb{V}(Y_i \mid \mathbf{X}_i, \mathbf{Z}_i) &\equiv \sigma^2_i \\ &= \exp(\mathbf{Z}_i'\boldsymbol{\alpha}) \\ \end{align} $$ Note that it is possible that $\mathbf{X}_i$ and $\mathbf{Z}_i$ have common elements.

Rewriting the conditional mean equation equivalently as $$ \begin{align} Y_i &= \mathbf{X}_i'\boldsymbol{\beta} + \varepsilon_i \\ \mathbb{E}(\varepsilon_i \mid \mathbf{X}_i, \mathbf{Z}_i) &= 0 \end{align} $$

Two-step estimation

The main aim of Harvey (1976) is to provide (efficient) estimates of $\boldsymbol{\alpha}$, rather than to use that estimate to compute a WLS estimator. However, once the parameters $\boldsymbol{\alpha}$ are computed, they can be so used.

The two-step estimator that you have described, follows the procedure:
1. Compute the OLS estimator $\hat{\boldsymbol{\beta}}$, and the OLS residuals $\hat{\varepsilon}_i^2$,
2. Compute the estimates $\hat{\boldsymbol{\alpha}}$, from the regression
$$ \log \hat{\varepsilon}_i^2 = \mathbf{Z}_i'\boldsymbol{\alpha} + \nu_i $$

Maximum likelihood estimation

The MLE is claimed to be up to 60% more efficient than the two-step estimator above. There are some other advantages that are apparent to me including that the MLE would be the pseudo- maximum likelihood under distributional misspecification, and that there is no need to recompute the WLS after computing $\hat{\boldsymbol{\alpha}}$. However, this does not seem to be borne out by my calculations below.

Under conditional normality of the response, the likelihood is very simple to write down and optimize

$$ \begin{align} \log L_i(\boldsymbol{\beta}, \boldsymbol{\alpha}) &= -\frac{1}{2}(\log2 \pi + \mathbf{Z}_i'\boldsymbol{\alpha})\\ &\qquad -\frac{1}{2}\left(\dfrac{(Y_i - \mathbf{X}_i'\boldsymbol{\beta})^2}{\exp(\mathbf{Z}_i'\boldsymbol{\alpha})}\right) \end{align} $$

Two-step estimation: R

To implement this, first let us simulate some heteroskedastic data using the model given, and estimate it using OLS.

#==========================================================
# simulate the heteroskedastic data
#==========================================================
iN = 1000
iK1 = 7
iK2 = 4

mX = cbind(1, matrix(rnorm(iN*iK1), nrow = iN, ncol = iK1))
mZ = cbind(1, matrix(rnorm(iN*iK2), nrow = iN, ncol = iK2))

vBeta = rnorm(1 + iK1)
vAlpha = rnorm(1 + iK2)

vY = rnorm(iN, mean = mX %*% vBeta, sd = sqrt(exp(mZ %*% vAlpha)))

#==========================================================
# fit the data using OLS
#==========================================================
vBetaOLS = coef(lmHetMean <- lm.fit(y = vY, x = mX))

Next, we can get the results using the two-step procedure:

#==========================================================
# two-step estimation
#==========================================================
residHet = resid(lmHetMean)
vVarEst = exp(fitted(lmHetVar <- lm.fit(y = log(residHet^2), x = mZ)))

vBetaTS = coef(lm.fit(y = vY/vVarEst, x = apply(mX, 2, function(x) x/vVarEst)))

Maximum likelihood estimation: R

#==========================================================
# likelihood function
#==========================================================
fnLogLik = function(vParam, vY, mX, mZ) {
  vBeta = vParam[1:ncol(mX)]
  vAlpha = vParam[(ncol(mX)+1):(ncol(mX)+ncol(mZ))]

  negLogLik = -sum(0.5*(log(2*pi) - mZ %*% vAlpha - 
                           (vY - mX %*% vBeta)^2/(exp(mZ %*% vAlpha))))
  return(negLogLik)
}

# test the function
# debugonce(fnLogLik)
fnLogLik(c(vBeta, vAlpha), vY, mX, mZ)

#==========================================================
# MLE
#==========================================================
vParam0 = rnorm(13)
optimHet = optim(vParam0, fnLogLik, vY = vY, mX = mX, mZ = mZ)
vBetaML = optimHet$par

#==========================================================
# collect all the results
#==========================================================
cbind(vBeta, vBetaOLS, vBetaTS, vBetaML = vBetaML[1:8])

I am not entirely sure why the ML results are a bit farther off than even the OLS, and I am not ruling out a coding mistake.

But as you can see, the 2-step estimator seems to do better than the OLS estimator.

Related Question