Solved – Package plm random effect residuals

panel dataplmrrandom-effects-modelstata

Simply put, I'd like to know how the plm package in R calculates the residuals of a random-effect regression.

I ask this because i'm getting some "weird" outputs. Let-me reproduce them here using the Grunfeld data for four firms, like Gujarati in his Basic Econometrics do:

require(plm)
require(foreign)

Grunfeld<-read.dta("Data.dta")
Grunfeld<-pdata.frame(Grunfeld,index = c("id","t"))

grun.re <- plm(Y~X2+X3,data=Grunfeld,model="random",index="id")

#Means by id
X2M<-tapply(Grunfeld$X2,Grunfeld$id,FUN = mean)
X3M<-tapply(Grunfeld$X3,Grunfeld$id,FUN = mean)
YM<-tapply(Grunfeld$Y,Grunfeld$id,FUN = mean)

#Random Effect: Fit the model and the calculate residuals "by hand"
fit.re<-grun.re$coefficients[1]+grun.re$coefficients[2]*Grunfeld$X2+grun.re$coefficients[3]*Grunfeld$X3
calcResid.re<-(Grunfeld$Y-fit.re)

#Random Effect:
head(cbind(grun.re$residuals,Grunfeld[,11:13],calcResid.re))

  grun.re$residuals   alphaRE       eRE        uRE calcResid.re
1         99.395803 -169.9282 116.23154  -53.69666    -53.69666
2         18.023715 -169.9282  34.85946 -135.06874   -135.06874
3        -39.256625 -169.9282 -22.42089 -192.34909   -192.34908
4         -2.857048 -169.9282  13.97869 -155.94951   -155.94951
5        -28.334107 -169.9282 -11.49837 -181.42656   -181.42656
6          6.475226 -169.9282  23.31096 -146.61723   -146.61723

In this table, uRE is the overall residual of the regression provided by Stata (which is identical to Gretl's) and calcResid.re is the manually calculated residuals from the fitted model. So, Stata, Gretl and I did the same. But what plm package do?

We can se that calcResid.re and uRE are equals. But the residuals provided by the plm estimation (grun.re$residuals) completely differs.

Here is a link to the dataset and results: https://github.com/rrremedio/shared_folder/blob/master/Data.dta

Best Answer

Thank you Helix. I expect don't breaking any code of politeness answering my own question. In fact, this question is related to this. Yet, I wil try give an answer from the econometrician point fo view now.

After long time, I realized that in a Random effects estimates you are running a demeaned regression as is said in equation 6 of the plm package paper here. However, I think their notation a litle "unrelated" to the rest of the paper.

Folowing Cameron and Trivedi, Microeconometrics Methods, the feasible GLS estimator can be implemented making OLS in the demeaned equation. That is Cameron & Trivedi 21.43 demeaned equation (which is the same as the cited above).

$$ y_{it}-\widehat{\theta}{\overline{y}_{it}}=(1-\widehat{\theta})\mu+(x_{it}-\widehat{\theta}{\overline{x}_{i}})'\beta+\upsilon_{it}$$

Where:

$$\widehat{\theta}=1-\frac{\sigma_\epsilon}{(T\sigma_\alpha^2+\sigma_\epsilon^2)^{1/2}}$$

The plm package calculates theta and stores it in the regression object.

And where, $$\upsilon=(1-\widehat{\theta})\alpha_i+(\epsilon_{it}-\widehat{\theta}{\overline{\epsilon}_{i}})$$

In a Random Effects model, the plm regression residuals are, in fact, the upsilon as above.

However, if we calculate the residuals by hand, u=Y-XB, we will obtain what Stata calls the overall error of the model. In a fixed effect model it is $$u_{it}=\alpha_{i}+\epsilon_{it}$$.

Where alpha is the individual especif effect and epsilon the idiosyncratic error. Once we obtain alpha of random effects by the shrinkage factor (as the cited related question does), we can recover the idiosyncratic error.

In summary, what plm package returns as the residuals from random effets model are the residuals of the OLS demeaned regression.

Wooldridge, Hsiao and Baltagi books about econometrics panel data derive the same result for the feseable GLS.

Related Solutions

R – How to Use the ‘Within’ Model with plm Package for Panel Data Analysis

The two estimators are computed differently, but are numerically identical, so essentially it doesn't matter. The within estimator is computationally easier since it keeps the size of the design matrix in check, and I would think that is how the within estimator is implemented. Here is some R code to demonstrate this

library(plm)
data("Produc", package = "plm")
plmResults <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, 
                  index = c("state","year"))
summary(plmResults)

regResults <- lm(log(gsp) ~ as.factor(state) + log(pcap) + log(pc) + log(emp) + unemp, 
                 data = Produc)
summary(regResults)

Or, if you prefer, some Stata code,

webuse nlswork
xtset idcode

xtreg ln_w grade c.age##c.age c.ttl_exp##c.ttl_exp c.tenure##c.tenure ///
 2.race not_smsa south, fe

areg ln_w grade c.age##c.age c.ttl_exp##c.ttl_exp c.tenure##c.tenure ///
 2.race not_smsa south, absorb(idcode)

A proof using the Frisch-Waugh-Lovell theorem can easily be given. Note one crucial point that for a large number of groups, that is, $n\to \infty$, the estimates of the coefficients on the group dummies are not consistent.

Solved – Random effects model with PLM: “System is computationally singular”-Error

Caveat: I haven't used the plm package.

It's impossible to recreate this error because we don't have access to your data. But if the matrix is computationally singular, then you have a problem of multicollonearity. As you've identified, there's essentially two reasons that can happen.

Multicollinearity. Even if the pairwise correlations are low, it's possible that more than two columns of IVs are highly collinear with another IV column. Another piece of evidence toward this conclusion is that excluding some variables removes the error: you're removing some of the collinear columns, so the estimators are uniquely identifiable given the available data. Removing highly collinear columns doesn't change the amount of information to the regression -- the removed variables are completely or mostly determined by the remaining ones -- so it's perfectly valid as a solution. It's purely a question which explanatory variables you are more interested in.
More features than observations You speculate that this could be problem with this remark.

Unbalanced panel / NAs? The data is unbalanced and there are NAs. Fixed effects output says: n=16, T=18-40, N=455. Probably the unbalanced data or the NAs are the reason for the error?

I don't know what "n=16, T=18-40, N=455" means to you (What are n, T, and N?). But a regression with more columns than observations is likewise not identified by the data.

Best Answer

Related Solutions

R – How to Use the ‘Within’ Model with plm Package for Panel Data Analysis

Solved – Random effects model with PLM: “System is computationally singular”-Error

Related Question