R – Resolving mlogit Error: ‘System is Exactly Singular’ in Logistic Regression

logisticmultinomial-distributionr

So I have data from a randomized blind trial of 1mg of nicotine gum on dual n-back working memory scores; I analyzed them as usual with a t-test and found a small increase in means but a large increase in standard deviations on a f-test! Strange. I also have data for each day on mood/productivity that day on a 1-5 scale.

I wondered: is nicotine following an inverse U-curve, where it causes higher scores on the worser days (1-3) and lower scores on the better days (3-5)? I look around and it seems I want a multinomial logistic regression comparing the placebo & active days.

I enter the data & load mlogit:

nicotine <- read.table(stdin(),header=TRUE)
day      active mp score
20120824 1      3  35.2
20120827 0      5  37.2
20120828 0      3  37.6
20120830 1      3  37.75
20120831 1      2  37.75
20120902 0      2  36.0
20120905 0      5  36.0
20120906 1      5  37.25
20120910 0      5  49.2
20120911 1      3  36.8
20120912 0      3  44.6
20120913 0      5  38.4
20120915 0      5  43.8
20120916 0      2  39.6
20120918 0      3  49.6
20120919 0      4  38.4
20120923 0      5  36.2
20120924 0      5  45.4
20120925 1      3  43.8
20120926 0      4  36.4
20120929 1      3  43.8
20120930 1      3  36.0
20121001 1      3  46.0
20121002 0      4  45.0
20121008 0      2  34.6
20121009 1      3  45.2
20121012 0      5  37.8
20121013 0      4  37.2
20121016 0      4  40.2
20121020 1      3  39.0
20121021 0      3  41.2
20121022 0      3  42.2
20121024 0      5  40.4
20121029 1      2  41.4
20121031 1      3  38.4
20121101 1      5  43.8
20121102 0      3  48.2
20121103 1      5  40.6

library(mlogit)
Nicotine <- mlogit.data(nicotine,shape="wide", choice="mp")
mlogit(score ~ (active + mp)^2, Nicotine)
Error in solve.default(H, g[!fixed]) : 
  Lapack routine dgesv: system is exactly singular
Calls: mlogit ... mlogit.optim -> as.vector -> solve -> solve.default

The error also happens even with the simplest call I can think of:

mlogit(score ~ active, Nicotine)
Error in solve.default(H, g[!fixed]) : 
  Lapack routine dgesv: system is exactly singular
Calls: mlogit ... mlogit.optim -> as.vector -> solve -> solve.default

Reading the documentation for mlogit didn't much help, and look at the other questions having the same error, they're different enough I can't tell whether they apply or not.

Thank you for your assistance.

Best Answer

You don't want multinomial logit as your dependent variable is a score that is nearly continuous. I would start by plotting the data e.g. with

with(nicotine, stripchart(jitter(active)~score, vertical = TRUE))

which doesn't reveal any obvious pattern.

Then you could look at a linear model:

m1 <- with(nicotine, lm(score~as.factor(active)))
summary(m1)

(this is equivalent to the t-test you ran) which shows a small and nonsignificant difference. Plotting m1 doesn't reveal anything particularly interesting to my eyes, either. You say you found large differences in variances but

with(nicotine, sd(score[active == 1]))
with(nicotine, sd(score[active == 0]))

shows the difference to be not all that large (and the stripchart shown above agrees).

Then you could add mp to the model:

m2 <- with(nicotine, lm(score~as.factor(active) + mp))
summary(m2)

which also shows only very small differences and a miniscule $R^2$

There's probably other things you could do, but it looks like there is not much to find here.

Related Solutions

Logistic – Why Exact Singularity When Adding Dummy Variable to Multinomial Logistic Using R mlogit?

I think I have discovered the source of the error. In my model, as well as in the example above, the dummy is an individual specific attribute, but I had included it in the model statement as an alternative specific attribute.

Properly specifying the dummy as an individual-specific attribute (i.e. after the '|') produces a sensible result:

summary(mlogit(choice ~ pf + cl + loc + wk + tod + seas | odd.dummy-1 , data=Electr)) # Dummy as individual-specific attribute

Coefficients :
             Estimate Std. Error  t-value Pr(>|t|)    
pf          -0.624054   0.023369 -26.7048   <2e-16 ***
cl          -0.108130   0.008281 -13.0577   <2e-16 ***
loc          1.442730   0.050706  28.4527   <2e-16 ***
wk           0.997726   0.044911  22.2157   <2e-16 ***
tod         -5.454041   0.185186 -29.4517   <2e-16 ***
seas        -5.831122   0.188030 -31.0116   <2e-16 ***
odd.dummy:2  0.057435   0.067904   0.8458   0.3976    
odd.dummy:3  0.064776   0.068510   0.9455   0.3444    
odd.dummy:4  0.061038   0.067753   0.9009   0.3676

I'll leave it to someone else to try to explain the mathematics of what was happening.

Solved – Error “system is computationally singular” when running a glm

It means your design matrix is not invertible and therefore can't be used to develop a regression model. This results from linearly dependent columns, i.e. strongly correlated variables. Examine the pairwise covariance (or correlation) of your variables to investigate if there are any variables that can potentially be removed. You're looking for covariances (or correlations) >> 0. Alternatively, you can probably automate this variable selection by using a forward stepwise regression.

This can also result from having more variables than observations, in which case your design matrix is probably not full rank. This is a bit trickier to fix, but there are ways. I believe lasso regression is supposed to work well when the data is "wider" than it is "long."

Keep in mind: if you decide to try lasso or stepwise selection, your doing much more (in terms of variable selection) than just handling multicolinearity.

Best Answer

Related Solutions

Logistic – Why Exact Singularity When Adding Dummy Variable to Multinomial Logistic Using R mlogit?

Solved – Error “system is computationally singular” when running a glm

Related Question