I've been trying to use the msm
package in R to model an 6 state, multi-state model of a disease. My data set, in total, contains about 22,000 subjects, with slightly over 81k observations.
I had a lot of trouble getting my data prepped because my data was pretty messy. However, now that my data is all ready, when I run the model I get an overflow error:
MD.msm <- msm( PatientState_num ~ Years, subject=IDNumber, data = MD_Death2, qmatrix = Q2.crude, deathexact = 6)
Error in Ccall.msm(params, do.what = "lik", ...) :
numerical overflow in calculating likelihood
I cannot seem to find any resources that would help me diagnosis what is causing the overflow nor how I might correct it.
The closest bit of advice that I can find from Page 8 of Multi-State Models for Panel Data: The msm Package for R (https://www.jstatsoft.org/article/view/v038i08) states:
To facilitate convergence, the "BFGS" quasi-Newton optimization
algorithm is used (see the documentation for the R function optim()),
and the maximum number of iterations is increased to 10000. The -2
log-likelihood is also divided by 4000, since it takes values around
4000 for plausible ranges of the parameters. This ensures that
optimization takes place on an approximate unit scale, to avoid
numerical overflow or underflow
However, if I just use those options I still get the overflow error.
Any suggestions?
In case you need to look at the crude initial values for transition intensities they are:
Q2.crude <- crudeinits.msm(PatientState_num ~ Years, IDNumber, data=MD_Death2, qmatrix=transitions_allowed)
> Q2.crude
1 2 3 4 5 6
1 -5.1078374 2.8763872 1.35671107 0.130212308 0.5666499 0.17787694
2 0.7587895 -2.4108875 0.05652063 0.583858120 0.9748396 0.03687971
3 1.0173350 1.2362397 -3.07569242 0.255661062 0.3328495 0.23360723
4 0.3807051 0.7235918 0.16135846 -1.265655417 0.0000000 0.00000000
5 0.7331575 0.4638941 0.21511807 0.004390165 -1.4941194 0.07755958
6 0.0000000 0.0000000 0.00000000 0.000000000 0.0000000 0.00000000
Also if you need to see the state table:
statetable.msm(PatientState_num, IDNumber, data=MD_Death2)
to
from 1 2 3 4 5 6
1 73 10802 5095 489 2128 668
2 5370 12681 400 4132 6899 261
3 2491 3027 1331 626 815 572
4 151 287 64 1 0 8
5 1002 634 294 6 101 106
Best Answer
I have had the same problem. In my case i solved it by first differentiate time within each subject and take the minimum value. That can be done using aggregate():
If you then merge this with your data you can easily exclude subjects in which there are very small differences in time between each observation.
In my case, after excluding subjects that had less than 0.01 years between two observations (which in my case were likely errors) solved the problem. If still does not work you might also try increasing the fnscale option further, (and maximum iterations if necessary), eg pass
to msm().