Solved – Random effects model with PLM: “System is computationally singular”-Error

panel dataplmrrandom-effects-modelregression

I am currently trying to estimate some panel data models in R using PLM package. This includes the estimation of basic pooled, fixed effects and random effects models. Therefore I make use of this code:

# read in data
mydata<- read.csv2("Panel.csv")
attach(mydata)

# define dependant variable
standarddeviation <- cbind(sd)
# define independant variable
x <- cbind(ratio1, ratio2, ratio3, ratio4, mean)

# Set data as panel data
pdata <- plm.data(mydata, index=c("id","t"))

# Pooled OLS estimator
pooling <- plm(standarddeviation ~ x, data=pdata, model= "pooling")
summary(pooling)

# Between estimator
between <- plm(standarddeviation ~ x, data=pdata, model= "between")
summary(between)

# First differences estimator
firstdiff <- plm(standarddeviation ~ x, data=pdata, model= "fd")
summary(firstdiff)

# Fixed effects or within estimator
fixed <- plm(standarddeviation ~ x data=pdata, model= "within")
summary(fixed)

# Random effects estimator
random <- plm(standarddeviation ~ x, data=pdata, model= "random")
summary(random)

Now here's the problem: I can without any problem estimate all models except for the random effects model. After entering the "random"-formula, R produces the following error:

"Error in solve.default(crossprod(X.m)) : 
  system is computationally singular: reciprocal condition number = 9.57127e-023"

First guesses:

  • Linear combinations in x?
    A first guess would be that there are exact linear dependencies of the exogenous variables in x. The data is balance sheet data and I would like to explain the standard deviation (y) of a specific balance sheet position by other balance sheet positions (or the ratio of the position and the balance sheet sum). Of course, the variables in x are related to each other. For example some of the ratios are calculated by dividing by the mean which is also a separate independant variable. And the dependant variable, which is the standard deviation, is also calculated by using this mean. But again: There should be no EXACT correlation. But: If I exclude some of my exogenous variables, the problem disappears, but I have to include them actually.
  • Unbalanced panel / NAs?
    The data is unbalanced and there are NAs. Fixed effects output says: n=16, T=18-40, N=455. Probably the unbalanced data or the NAs are the reason for the error?

Traceback-Code:

random <- plm(standarddeviaution ~ x, data=pdata, model= "random")
Error in solve.default(crossprod(X.m)) : 
  system is computationally singular: reciprocal condition number = 9.57127e-023
traceback()
8: solve.default(crossprod(X.m))
7: solve(crossprod(X.m))
6: diag(solve(crossprod(X.m)) %*% crossprod(X.sum))
5: swar(object, data, effect)
4: ercomp.formula(formula, data, effect, method = random.method)
3: ercomp(formula, data, effect, method = random.method)
2: plm.fit(formula, data, model, effect, random.method, inst.method)
1: plm(standarddeviation ~ x, data = pdata, model = "random")

Is there anybody who can give me a hint what this error does actually mean and especially: how to solve the problem? How do I have to correct the code in order to get results? Thanks a lot!

Best Answer

Caveat: I haven't used the plm package.

It's impossible to recreate this error because we don't have access to your data. But if the matrix is computationally singular, then you have a problem of multicollonearity. As you've identified, there's essentially two reasons that can happen.

  1. Multicollinearity. Even if the pairwise correlations are low, it's possible that more than two columns of IVs are highly collinear with another IV column. Another piece of evidence toward this conclusion is that excluding some variables removes the error: you're removing some of the collinear columns, so the estimators are uniquely identifiable given the available data. Removing highly collinear columns doesn't change the amount of information to the regression -- the removed variables are completely or mostly determined by the remaining ones -- so it's perfectly valid as a solution. It's purely a question which explanatory variables you are more interested in.

  2. More features than observations You speculate that this could be problem with this remark.

Unbalanced panel / NAs? The data is unbalanced and there are NAs. Fixed effects output says: n=16, T=18-40, N=455. Probably the unbalanced data or the NAs are the reason for the error?

I don't know what "n=16, T=18-40, N=455" means to you (What are n, T, and N?). But a regression with more columns than observations is likewise not identified by the data.

Related Question