Solved – Multi-State Numerical overflow using MSM package in R

multinomial-distributionrsurvival

I've been trying to use the msm package in R to model an 6 state, multi-state model of a disease. My data set, in total, contains about 22,000 subjects, with slightly over 81k observations.

I had a lot of trouble getting my data prepped because my data was pretty messy. However, now that my data is all ready, when I run the model I get an overflow error:

MD.msm <- msm( PatientState_num ~ Years, subject=IDNumber, data = MD_Death2, qmatrix = Q2.crude, deathexact = 6)

Error in Ccall.msm(params, do.what = "lik", ...) : 
  numerical overflow in calculating likelihood

I cannot seem to find any resources that would help me diagnosis what is causing the overflow nor how I might correct it.

The closest bit of advice that I can find from Page 8 of Multi-State Models for Panel Data: The msm Package for R (https://www.jstatsoft.org/article/view/v038i08) states:

To facilitate convergence, the "BFGS" quasi-Newton optimization
algorithm is used (see the documentation for the R function optim()),
and the maximum number of iterations is increased to 10000. The -2
log-likelihood is also divided by 4000, since it takes values around
4000 for plausible ranges of the parameters. This ensures that
optimization takes place on an approximate unit scale, to avoid
numerical overflow or underflow

However, if I just use those options I still get the overflow error.

Any suggestions?

In case you need to look at the crude initial values for transition intensities they are:

Q2.crude <- crudeinits.msm(PatientState_num ~ Years, IDNumber, data=MD_Death2, qmatrix=transitions_allowed)

> Q2.crude
           1          2           3            4          5          6
1 -5.1078374  2.8763872  1.35671107  0.130212308  0.5666499 0.17787694
2  0.7587895 -2.4108875  0.05652063  0.583858120  0.9748396 0.03687971
3  1.0173350  1.2362397 -3.07569242  0.255661062  0.3328495 0.23360723
4  0.3807051  0.7235918  0.16135846 -1.265655417  0.0000000 0.00000000
5  0.7331575  0.4638941  0.21511807  0.004390165 -1.4941194 0.07755958
6  0.0000000  0.0000000  0.00000000  0.000000000  0.0000000 0.00000000

Also if you need to see the state table:

statetable.msm(PatientState_num, IDNumber, data=MD_Death2)
    to
from     1     2     3     4     5     6
   1    73 10802  5095   489  2128   668
   2  5370 12681   400  4132  6899   261
   3  2491  3027  1331   626   815   572
   4   151   287    64     1     0     8
   5  1002   634   294     6   101   106

Best Answer

I have had the same problem. In my case i solved it by first differentiate time within each subject and take the minimum value. That can be done using aggregate():

    diffs <- aggregate(cbind(minDiff=Years)~IDNumber, FUN=function(x) min(diff(x)),data=MD_Death2)

If you then merge this with your data you can easily exclude subjects in which there are very small differences in time between each observation.

    MD_Death2 <- merge(sampleData,diffs,by='IDNumber',all.x=T)

In my case, after excluding subjects that had less than 0.01 years between two observations (which in my case were likely errors) solved the problem. If still does not work you might also try increasing the fnscale option further, (and maximum iterations if necessary), eg pass

    control=list(fnscale=5000,maxit=500)

to msm().