Solved – Random effects model with PLM: “System is computationally singular”-Error

panel dataplmrrandom-effects-modelregression

I am currently trying to estimate some panel data models in R using PLM package. This includes the estimation of basic pooled, fixed effects and random effects models. Therefore I make use of this code:

# read in data
mydata<- read.csv2("Panel.csv")
attach(mydata)

# define dependant variable
standarddeviation <- cbind(sd)
# define independant variable
x <- cbind(ratio1, ratio2, ratio3, ratio4, mean)

# Set data as panel data
pdata <- plm.data(mydata, index=c("id","t"))

# Pooled OLS estimator
pooling <- plm(standarddeviation ~ x, data=pdata, model= "pooling")
summary(pooling)

# Between estimator
between <- plm(standarddeviation ~ x, data=pdata, model= "between")
summary(between)

# First differences estimator
firstdiff <- plm(standarddeviation ~ x, data=pdata, model= "fd")
summary(firstdiff)

# Fixed effects or within estimator
fixed <- plm(standarddeviation ~ x data=pdata, model= "within")
summary(fixed)

# Random effects estimator
random <- plm(standarddeviation ~ x, data=pdata, model= "random")
summary(random)

Now here's the problem: I can without any problem estimate all models except for the random effects model. After entering the "random"-formula, R produces the following error:

"Error in solve.default(crossprod(X.m)) : 
  system is computationally singular: reciprocal condition number = 9.57127e-023"

First guesses:

Linear combinations in x?
A first guess would be that there are exact linear dependencies of the exogenous variables in x. The data is balance sheet data and I would like to explain the standard deviation (y) of a specific balance sheet position by other balance sheet positions (or the ratio of the position and the balance sheet sum). Of course, the variables in x are related to each other. For example some of the ratios are calculated by dividing by the mean which is also a separate independant variable. And the dependant variable, which is the standard deviation, is also calculated by using this mean. But again: There should be no EXACT correlation. But: If I exclude some of my exogenous variables, the problem disappears, but I have to include them actually.
Unbalanced panel / NAs?
The data is unbalanced and there are NAs. Fixed effects output says: n=16, T=18-40, N=455. Probably the unbalanced data or the NAs are the reason for the error?

Traceback-Code:

random <- plm(standarddeviaution ~ x, data=pdata, model= "random")
Error in solve.default(crossprod(X.m)) : 
  system is computationally singular: reciprocal condition number = 9.57127e-023
traceback()
8: solve.default(crossprod(X.m))
7: solve(crossprod(X.m))
6: diag(solve(crossprod(X.m)) %*% crossprod(X.sum))
5: swar(object, data, effect)
4: ercomp.formula(formula, data, effect, method = random.method)
3: ercomp(formula, data, effect, method = random.method)
2: plm.fit(formula, data, model, effect, random.method, inst.method)
1: plm(standarddeviation ~ x, data = pdata, model = "random")

Is there anybody who can give me a hint what this error does actually mean and especially: how to solve the problem? How do I have to correct the code in order to get results? Thanks a lot!

Best Answer

Caveat: I haven't used the plm package.

It's impossible to recreate this error because we don't have access to your data. But if the matrix is computationally singular, then you have a problem of multicollonearity. As you've identified, there's essentially two reasons that can happen.

Multicollinearity. Even if the pairwise correlations are low, it's possible that more than two columns of IVs are highly collinear with another IV column. Another piece of evidence toward this conclusion is that excluding some variables removes the error: you're removing some of the collinear columns, so the estimators are uniquely identifiable given the available data. Removing highly collinear columns doesn't change the amount of information to the regression -- the removed variables are completely or mostly determined by the remaining ones -- so it's perfectly valid as a solution. It's purely a question which explanatory variables you are more interested in.
More features than observations You speculate that this could be problem with this remark.

Unbalanced panel / NAs? The data is unbalanced and there are NAs. Fixed effects output says: n=16, T=18-40, N=455. Probably the unbalanced data or the NAs are the reason for the error?

I don't know what "n=16, T=18-40, N=455" means to you (What are n, T, and N?). But a regression with more columns than observations is likewise not identified by the data.

Related Solutions

Swamy-Arora Estimator – Understanding Error in PLM Random Effects Swamy–Arora (Swar) Estimator with Lagged Dependent Variable

The error message is correct; this is not an error with the plm package in particular. The default Swamy and Arora (1972) estimator (random.method="swar" is used if not something else is explicitly stated by the user) is not guaranteed to yield positive estimates for the variance.

Wooldridge (2010), p. 296 has some advice:

„As a practical matter, equation (10.37) is not guaranteed to be positive, although it is in the vast majority of applications. A negative value for sigma_c^2 is indicative of negative serial correlation in u_it, probably a substantial amount, which means that Assumption RE.3a is violated. Alternatively, some other assumption in the model can be false. We should make sure that time dummies are included in the model if aggregate effects are important; omitting them can induce serial correlation in the implied u_it. When the intercepts are allowed to change freely over time, the effects of other aggregate variables will not be identified. If sigma_c^2 is negative, unrestricted FGLS may be called for; see Section 10.4.3."

Sometimes, just adding more control variables or transforming variables also help to overcome this issue (from a technical viewpoint) if you want to stick with the swar estimator.

Solved – r plm time and individual fixed effects – “twoways” vs. factor(index) time

From what I understand about the plm package, those two approaches should be identical.

However, the fixed effects produced from this explicit specification are shown to be "reference dependent" [i.e. relative to the default reference in your factor(index)]

    tfe <- plm(y ~ x1 + x2 + factor(index), data, model = "within", index = c("id", "index"))

In contrast, fixef() returns the fixed effects in levels (by default). For you to get the same fixed effect estimates, by specifying the following:

    fixef(object = tfe, effect = "individual", type = "dfirst")

The equivalent for the individual level fixed effects would be:

    fixef(object = tfe, effect = "time", type = "dfirst")

Computing R-Squared
Also, please see this post for computing R^2 and Adjusted R^2 manually for the full model (i.e. including both the fixed and specified effects): http://karthur.org/2016/fixed-effects-panel-models-in-r.html

Best Answer

Related Solutions

Swamy-Arora Estimator – Understanding Error in PLM Random Effects Swamy–Arora (Swar) Estimator with Lagged Dependent Variable

Solved – r plm time and individual fixed effects – “twoways” vs. factor(index) time

Related Question