Solved – How to estimate a nonlinear equation system in R

econometricsgeneralized-least-squaresnonlinear regressionrregression

I have trouble finding the right packages and methods to estimate a system of three nonlinear equations with cross-equation restrictions using R.

I want to estimate the parameters of a CES production functions using the methodology developed by Miguel A. León-Ledesma, Peter McAdam, and Alpo Willman (2010):

The estimator used for the system is a nonlinear feasible generalized least squares (FGLS) method which accounts for possible cross equation error correlation (much like an SUR model in linear contexts). The estimator performs NLLS on each individual equation and uses the estimated errors to build a variance-covariance (VCV) matrix and then estimates the system by GLS, completing one iteration. The estimated VCV matrix will be updated with each iteration until the system converges to a predetermined criterion.

I have completed the first step of a nonlinear estimation of the individual equations using the nlsLM function and build the vcv matrix of the estimated residuals but I'm struggling as to how I should implement the iterated FGLS procedure.

An example of my code is below:

psi = 1
sigma = 0.6
alpha.e = 0.01
alpha.l = 0.01

These are my starting values

es.f = les ~ (sigma-1)/sigma * (log(psi) + alpha.e*t)+ 
(1- sigma)/sigma * ly + (1+sigma)*log(ebar) + log(1-normals[6])

nl.e.lm = nlsLM(es.f,start = list(psi = psi, sigma = sigma, alpha.e = alpha.e))

les, ly, ebar, tbar are from the data, the elements of normalsare known constants. I only have a limited number of observations (23) which is why I am opting for the systems approach, which offers better identification especially with small sample sizes compared to other approaches such as the Kmenta approximation, which is implemented in the package micEconCES.

I get reasonable results in all three equations for at least 1 or 2 of the parameters, which is encouraging, but of course so far the cross-equation restrictions are not implemented because the equations are just estimated individually.

I would be glad for any help, just pointing me in the right direction would be helpful. I was thinking of trying the BBpackage but I am confused as to how I should implement what I want to do there.

P.S.: Side-question I just remembered: How come the dat$year reference to my data isn't working inside the call to nlsLM? I had to remove all references with $ and predefine those variables separately before my estimations.

edit: I have tried implementing the system using nlsystemfitbut it did not work.

Best Answer

After scouring the web for more information as to how a system of nonlinear equations can efficiently be solved in R, I stumbled upon the nlsur method available in Stata. I know not everyone is able to use Stata, but for those that have the opportunity, it's helpful to know that the implementation in Stata is very easy. I have reused all my datapreparation work and exported only the transformed variables as a .csv

This is the code which solves the following equation system:

#delimit ; /*sets line end indicator to semicolon*/
nlsur (les = ({sigma}-1)/{sigma} * (ln({psi}) + {alphae}*t)+ 
    (1- {sigma})/{sigma} * ly + (1+{sigma})*ln(ebar) + ln(1-gv))
    (lks = ({sigma}-1)/{sigma} * (ln({psi}) + {alphal}*beta*t + beta *  ln(lbar) + (1-beta) * ln(kbar)) + 
    (1- {sigma})/{sigma} * ly +  ln(gv* (1-beta)))
    (ly = ln({psi}) + {sigma}/({sigma} - 1) * ln(gv*(exp({alphal} * beta*t)*(lbar)^beta* 
    (kbar)^(1- beta))^(({sigma}-1)/{sigma}) + (1-gv)* (exp({alphae}*t)*ebar) ^(({sigma}-1)/{sigma}))), 
    initial(sigma 0.6 psi 1 alphae 0.05 alphal 0.05);
#delimit cr /*sets line end indicator back to carriage return*/

This uses a 2stage FGLS to solve the following three equations(if an iterated FGLS is wanted the flag ifgnls can be set after the initials): $$ \log S_E =\frac{\sigma -1}{\sigma}\left(\log{\psi} + \alpha_E \tilde{t}\right) + \frac{1-\sigma}{\sigma}\log{\tilde{Y}}+ (\sigma + 1)\log{\tilde{E}} + \log{\gamma_E} $$ $$ \log{S_L} = \frac{\sigma -1}{\sigma}\left(\log{\psi} + \alpha_L \beta \tilde{t}+\beta \log{\tilde{L}} + (1-\beta)\log{\tilde{K}}\right) + \frac{1-\sigma}{\sigma}\log{{\tilde{Y}}} + \log{\gamma_V \beta} $$ $$ \log\tilde{Y} = \log\psi + \frac{\sigma}{\sigma - 1} \log ( \gamma_{V}\left(e^{\alpha_{L}\tilde{t}\beta}\left(\tilde{L}\right)^{\beta} \left(\tilde{K}\right)^{1-\beta}\right)^{\frac{\sigma-1}{\sigma}} +\gamma_{E}\left(e^{\alpha_{E}\tilde{t}} \tilde{E}\right)^{\frac{\sigma-1}{\sigma}}) $$

The cross equation restrictions on my parameters are automatically recognized and the convergence behavior seems good. You have to use curly braces to explicitly declare the parameters. I personally am not a good enough programmer, but I hope that a package analogous to nlsurin Stata will become available in R.

Edit: One big draw-back is that this method does NOT support Newey-West standard errors which is a big issue for my estimation. When writing Stata support they have answered that development for this is currently not planned. This is a shame because hac-robust estimation is available for the normal nonlinear estimation, guess I will have to find something more suitable after all.

Related Solutions

Solved – Calculate log-likelihood “by hand” for generalized nonlinear least squares regression (nlme)

Let's start with the simpler case where there is no correlation structure for the residuals:

fit <- gnls(model=model,data=data,start=start)
logLik(fit)

The log likelihood can then be easily computed by hand with:

N <- fit$dims$N
p <- fit$dims$p
sigma <- fit$sigma * sqrt((N-p)/N)
sum(dnorm(y, mean=fitted(fit), sd=sigma, log=TRUE))

Since the residuals are independent, we can just use dnorm(..., log=TRUE) to get the individual log likelihood terms (and then sum them up). Alternatively, we could use:

sum(dnorm(resid(fit), mean=0, sd=sigma, log=TRUE))

Note that fit$sigma is not the "less biased estimate of $\sigma^2$" -- so we need to make the correction manually first.

Now for the more complicated case where the residuals are correlated:

fit <- gnls(model=model,data=data,start=start,correlation=correlation)
logLik(fit)

Here, we need to use the multivariate normal distribution. I am sure there is a function for this somewhere, but let's just do this by hand:

N <- fit$dims$N
p <- fit$dims$p
yhat <- cbind(fitted(fit))
R <- vcv(tree, cor=TRUE)
sigma <- fit$sigma * sqrt((N-p)/N)
S <- diag(sigma, nrow=nrow(R)) %*% R %*% diag(sigma, nrow=nrow(R))
-1/2 * log(det(S)) - 1/2 * t(y - yhat) %*% solve(S) %*% (y - yhat) - N/2 * log(2*pi)

Regression – Conditional Mean Independence: Unbiasedness and Consistency of OLS Estimator

It's false. As you observe, if you read Stock and Watson closely, they don't actually endorse the claim that OLS is unbiased for $\beta$ under conditional mean independence. They endorse the much weaker claim that OLS is unbiased for $\beta$ if $E(u|x,z)=z\gamma$. Then, they say something vague about non-linear least squares.

Your equation (4) contains what you need to see that the claim is false. Estimating equation (4) by OLS while omitting the variable $E(u|x,z)$ leads to omitted variables bias. As you probably recall, the bias term from omitted variables (when the omitted variable has a coefficient of 1) is controlled by the coefficients from the following auxiliary regression: \begin{align} E(u|z) = x\alpha_1 + z\alpha_2 + \nu \end{align} The bias in the original regression for $\beta$ is $\alpha_1$ from this regression, and the bias on $\gamma$ is $\alpha_2$. If $x$ is correlated with $E(u|z)$, after controlling linearly for $z$, then $\alpha_1$ will be non-zero and the OLS coefficient will be biased.

Here is an example to prove the point: \begin{align} \xi &\sim F(), \; \zeta \sim G(), \; \nu \sim H()\quad \text{all independent}\\ z &=\xi\\ x &= z^2 + \zeta\\ u &= z+z^2-E(z+z^2)+\nu \end{align}

Looking at the formula for $u$, it is clear that $E(u|x,z)=E(u|z)=z+z^2-E(z+z^2)$ Looking at the auxiliary regression, it is clear that (absent some fortuitous choice of $F,G,H$) $\alpha_1$ will not be zero.

Here is a very simple example in R which demonstrates the point:

set.seed(12344321)
z <- runif(n=100000,min=0,max=10)
x <- z^2 + runif(n=100000,min=0,max=20)
u <- z + z^2 - mean(z+z^2) + rnorm(n=100000,mean=0,sd=20)
y <- x + z + u

summary(lm(y~x+z))

# auxiliary regression
summary(lm(z+z^2~x+z))

Notice that the first regression gives you a coefficient on $x$ which is biased up by 0.63, reflecting the fact that $x$ "has some $z^2$ in it" as does $E(u|z)$. Notice also that the auxiliary regression gives you a bias estimate of about 0.63.

So, what are Stock and Watson (and your lecturer) talking about? Let's go back to your equation (4): \begin{align} y = x\beta + z\gamma + E(u|z) + v \end{align}

It's an important fact that the omitted variable is only a function of $z$. It seems like if we could control for $z$ really well, that would be enough to purge the bias from the regression, even though $x$ may be correlated with $u$.

Suppose we estimated the equation below using either a non-parametric method to estimate the function $f()$ or using the correct functional form $f(z)=z\gamma+E(u|z)$. If we were using the correct functional form, we would be estimating it by non-linear least squares (explaining the cryptic comment about NLS): \begin{align} y = x\beta + f(z) + v \end{align} That would give us a consistent estimator for $\beta$ because there is no longer an omitted variable problem.

Alternatively, if we had enough data, we could go ``all the way'' in controlling for $z$. We could look at a subset of the data where $z=1$, and just run the regression: \begin{align} y = x\beta + v \end{align} This would give unbiased, consistent estimators for the $\beta$ except for the intercept, of course, which would be polluted by $f(1)$. Obviously, you could also get a (different) consistent, unbiased estimator by running that regression only on data points for which $z=2$. And another one for the points where $z=3$. Etc. Then you'd have a bunch of good estimators from which you could make a great estimator by, say, averaging them all together somehow.

This latter thought is the inspiration for matching estimators. Since we don't usually have enough data to literally run the regression only for $z=1$ or even for pairs of points where $z$ is identical, we instead run the regression for points where $z$ is ``close enough'' to being identical.

Best Answer

Related Solutions

Solved – Calculate log-likelihood “by hand” for generalized nonlinear least squares regression (nlme)

Regression – Conditional Mean Independence: Unbiasedness and Consistency of OLS Estimator

Related Question