I am trying to do multiple imputation using the mice package in R, and the imputation keeps stopping with the following error:
Error in mice.impute.logreg(c(1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, :
dims [product 145] do not match the length of object [146]
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
2: In runif(length(d), 0, a1/10^10) : NAs produced
3: In runif(length(d), 0, a1/10^10) : NAs produced
4: In runif(length(d), 0, a1/10^10) : NAs produced
5: In runif(length(d), 0, a1/10^10) : NAs produced
6: In runif(length(d), 0, a1/10^10) : NAs produced
7: In runif(length(d), 0, a1/10^10) : NAs produced
8: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
9: In runif(length(d), 0, a1/10^10) : NAs produced
10: In runif(length(d), 0, a1/10^10) : NAs produced
11: In runif(length(d), 0, a1/10^10) : NAs produced
12: In runif(length(d), 0, a1/10^10) : NAs produced
13: In runif(length(d), 0, a1/10^10) : NAs produced
14: In runif(length(d), 0, a1/10^10) : NAs produced
15: In runif(length(d), 0, a1/10^10) : NAs produced
16: In runif(length(d), 0, a1/10^10) : NAs produced
17: In runif(length(d), 0, a1/10^10) : NAs produced
18: In runif(length(d), 0, a1/10^10) : NAs produced
19: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
20: In runif(length(d), 0, a1/10^10) : NAs produced
21: In runif(length(d), 0, a1/10^10) : NAs produced
22: In runif(length(d), 0, a1/10^10) : NAs produced
23: In runif(length(d), 0, a1/10^10) : NAs produced
24: In runif(length(d), 0, a1/10^10) : NAs produced
25: In runif(length(d), 0, a1/10^10) : NAs produced
26: In runif(length(d), 0, a1/10^10) : NAs produced
27: In runif(length(d), 0, a1/10^10) : NAs produced
28: In runif(length(d), 0, a1/10^10) : NAs produced
29: In runif(length(d), 0, a1/10^10) : NAs produced
30: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
31: In runif(length(d), 0, a1/10^10) : NAs produced
32: In runif(length(d), 0, a1/10^10) : NAs produced
33: In runif(length(d), 0, a1/10^10) : NAs produced
34: In runif(length(d), 0, a1/10^10) : NAs produced
35: In runif(length(d), 0, a1/10^10) : NAs produced
36: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
37: In runif(length(d), 0, a1/10^10) : NAs produced
38: In runif(length(d), 0, a1/10^10) : NAs produced
39: In runif(length(d), 0, a1/10^10) : NAs produced
40: In runif(length(d), 0, a1/10^10) : NAs produced
41: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
42: In runif(length(d), 0, a1/10^10) : NAs produced
43: In runif(length(d), 0, a1/10^10) : NAs produced
44: In runif(length(d), 0, a1/10^10) : NAs produced
45: In runif(length(d), 0, a1/10^10) : NAs produced
46: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
47: In runif(length(d), 0, a1/10^10) : NAs produced
48: In runif(length(d), 0, a1/10^10) : NAs produced
49: In runif(length(d), 0, a1/10^10) : NAs produced
50: In runif(length(d), 0, a1/10^10) : NAs produced
If the seed is set to a particular value, the error occurs at the same point, otherwise the error will occur at different iterations, imputations and variables in different runs.
OK. So here is the data structure. This was imported from Stata using read.dta. The list of variables is truncated and does not show the ones that are showing the problem. They are all binary, with a single missing value each, in the same record.
> str(peimp.in)
'data.frame': 146 obs. of 164 variables:
$ id : chr "195" "128" "218" "1106" ...
$ distress0 : num 5 6 0 2 1 0 5 1 1 5 ...
$ pcs0 : num 47.1 46.9 60.8 44 61.5 ...
$ mcs0 : num 33.4 43.8 52.4 38.6 53.1 ...
$ pwb0 : num 24 21 28 17 27 NA 8 16 24 8 ...
$ swb0 : num 24 25 28 24 28 NA 17 28 28 24 ...
$ ewb0 : num 18 14 20 15 20 ...
$ fwb0 : num 14 21 28 18.7 26 ...
$ ccs0 : num 14 22 25 25 26 NA 16 9 23 14 ...
$ pn.sf70 : int 2 2 0 4 0 3 1 3 3 4 ...
$ pn.sf80 : int 2 1 0 2 0 2 3 3 1 3 ...
$ pn.fcgp40 : int 1 1 0 4 0 NA 1 3 3 3 ...
$ pn.aqol0 : int 1 1 0 1 0 1 0 1 1 2 ...
$ pf.nbs0 : num 51.8 55.6 55.6 40.3 57.5 ...
$ rp.nbs0 : num 21.2 45.9 54.9 34.7 57.2 ...
$ bp.nbs0 : num 42.6 46.7 62 34.6 62 ...
$ gh.nbs0 : num 54.6 28.5 66.5 61.7 62.7 ...
$ vt.nbs0 : num 49.6 49.6 70.4 40.7 58.5 ...
$ sf.nbs0 : num 37.3 47.3 37.3 42.3 57.3 ...
$ re.nbs0 : num 24.8 45.7 56.2 45.7 56.2 ...
$ mh.nbs0 : num 37.8 43 53.5 28 50.9 ...
$ distress1 : int 8 7 7 5 0 5 NA 8 0 2 ...
$ pcs1 : num 36.4 23.3 43.9 27.5 33.9 ...
$ mcs1 : num 32.1 32.2 45.2 32.5 51 ...
$ pwb1 : num 17 3 13 13 25 19 14 0 24 21 ...
$ swb1 : num 24 24.5 27 26.8 28 ...
$ ewb1 : num 17 13 4 18 18 22 20 17 24 22 ...
$ fwb1 : num 15 3.5 18.7 6 15 ...
$ ccs1 : num 13 16 21 11 24 ...
$ pn.sf71 : int 2 4 4 3 2 3 5 4 1 3 ...
$ pn.sf81 : int 3 4 3 4 1 3 3 4 2 3 ...
$ pn.fcgp41 : int 1 4 3 1 1 2 2 4 0 3 ...
$ pn.aqol1 : int 1 1 2 1 1 0 1 2 0 1 ...
$ pf.nbs1 : num 40.3 19.3 38.4 19.3 42.2 ...
$ rp.nbs1 : num 28 21.2 57.2 25.7 21.2 ...
$ bp.nbs1 : num 38.6 26.5 30.5 30.1 46.7 ...
$ gh.nbs1 : num 34.2 28.5 59.4 46 42.7 ...
$ vt.nbs1 : num 31.8 38.7 46.7 25.9 37.7 ...
$ sf.nbs1 : num 27.3 22.2 22.2 17.2 47.3 ...
$ re.nbs1 : num 31.8 21.4 56.2 31.8 49.2 ...
$ mh.nbs1 : num 37.8 32.6 44.3 35.2 48.2 ...
$ distress2 : int 7 7 NA 8 0 0 10 1 0 0 ...
$ pcs2 : num 24 29.9 NA 37.7 51.7 ...
$ mcs2 : num 35.6 32.5 NA 29.8 59.9 ...
$ pwb2 : num 12 9 NA 11 28 22 17 20 26 24 ...
$ swb2 : num 24 23.3 NA 8 28 ...
$ ewb2 : int 22 13 NA 12 19 24 10 18 22 23 ...
$ fwb2 : num 8 13 NA 2 28 10.5 7 4 22 18 ...
$ ccs2 : num 10 10 NA 12 26 21 14 12 27 23 ...
$ pn.sf72 : int 5 4 NA 2 0 3 3 3 1 1 ...
$ pn.sf82 : int 4 3 NA 2 0 NA 4 3 1 0 ...
$ pn.fcgp42 : int 4 4 NA 1 0 1 3 2 0 1 ...
$ pn.aqol2 : int 3 2 NA 1 0 1 1 1 0 0 ...
$ pf.nbs2 : num 21.2 28.8 NA 30.8 49.9 ...
$ rp.nbs2 : num 21.2 30.2 NA 30.2 54.9 ...
$ bp.nbs2 : num 21.7 30.5 NA 42.6 62 ...
$ gh.nbs2 : num 38 26.1 NA 38 46 ...
$ vt.nbs2 : num 43.7 37.7 NA 31.8 61.5 ...
$ sf.nbs2 : num 22.2 27.3 NA 32.3 57.3 ...
$ re.nbs2 : num 17.9 24.8 NA 31.8 56.2 ...
$ mh.nbs2 : num 40.4 35.2 NA 27.3 58.7 ...
$ seegp2 : Factor w/ 2 levels "No","Yes": NA 2 1 2 1 1 2 2 2 2 ...
$ admed2 : Factor w/ 2 levels "No","Yes": NA 1 2 1 1 1 1 1 1 1 ...
$ admhsp2 : Factor w/ 2 levels "No","Yes": NA 2 2 2 1 2 1 2 1 1 ...
$ hlp.paid2 : Factor w/ 2 levels "No","Yes": NA 1 1 1 1 1 2 1 1 1 ...
$ hlp.unpaid2 : Factor w/ 2 levels "No","Yes": NA 1 1 1 1 1 1 1 1 1 ...
$ hlp.ff2 : Factor w/ 2 levels "No","Yes": NA 1 1 1 1 1 2 2 2 1 ...
$ inhosp2 : Factor w/ 2 levels "_","Unwell/in hospital": 1 1 1 1 1 1 1 1 1 1 ...
$ wthdrw2 : Factor w/ 2 levels "_","Withdrawn": 1 1 1 1 1 1 1 1 1 1 ...
$ dth2 : Factor w/ 2 levels "_","Deceased": 1 1 1 1 1 1 1 1 1 1 ...
$ distress3 : num 3 4 0 5 0 4 10 0 0 1 ...
$ pcs3 : num 42.1 24.9 46.9 32.3 55.3 ...
$ mcs3 : num 43.1 44.2 69.2 30.3 58.5 ...
$ pwb3 : num 17 10 28 10 28 ...
$ swb3 : num 18 21 24 16 28 ...
$ ewb3 : int 20 15 24 13 21 21 18 19 23 23 ...
$ fwb3 : num 14 11 28 7 25 NA 7 11 28 16 ...
$ ccs3 : num 20 13 28 10 25 ...
$ pn.sf73 : int 3 4 0 3 0 2 4 1 1 3 ...
$ pn.sf83 : int 2 4 0 3 0 0 3 0 0 2 ...
$ pn.fcgp43 : int 2 4 0 3 0 1 4 1 0 2 ...
$ pn.aqol3 : int 1 2 0 1 0 1 1 0 0 1 ...
$ pf.nbs3 : num 46.1 26.9 26.9 32.7 55.6 ...
$ rp.nbs3 : num 34.7 23.5 57.2 30.2 54.9 ...
$ bp.nbs3 : num 38.2 26.5 62 34.2 62 ...
$ gh.nbs3 : num 49.9 30.8 66.5 26.1 50.8 ...
$ vt.nbs3 : num 37.7 46.7 70.4 31.8 61.5 ...
$ sf.nbs3 : num 47.3 42.3 57.3 27.3 57.3 ...
$ re.nbs3 : num 35.3 28.3 56.2 31.8 56.2 ...
$ mh.nbs3 : num 48.2 40.4 64 29.9 58.7 ...
$ seegp3 : Factor w/ 2 levels "No","Yes": 1 NA 1 2 2 2 1 2 2 2 ...
$ admed3 : Factor w/ 2 levels "No","Yes": 1 NA 2 2 1 1 1 1 1 1 ...
$ admhsp3 : Factor w/ 2 levels "No","Yes": 2 NA 1 2 1 2 2 1 1 1 ...
$ hlp.paid3 : Factor w/ 2 levels "No","Yes": 1 NA 2 2 1 1 2 2 1 1 ...
$ hlp.unpaid3 : Factor w/ 2 levels "No","Yes": 1 NA 1 1 1 1 1 1 1 1 ...
$ hlp.ff3 : Factor w/ 2 levels "No","Yes": 1 NA 2 1 1 1 2 2 1 1 ...
$ inhosp3 : Factor w/ 2 levels "_","Unwell/in hospital": 1 1 1 1 1 2 1 1 1 1 ...
$ wthdrw3 : Factor w/ 2 levels "_","Withdrawn": 1 1 1 1 1 1 1 1 1 1 ...
$ dth3 : Factor w/ 2 levels "_","Deceased": 1 1 1 1 1 1 1 1 1 1 ...
[list output truncated]
- attr(*, "datalabel")= chr "Dataset for imputation"
- attr(*, "time.stamp")= chr "23 Jul 2012 09:19"
- attr(*, "formats")= chr "%5s" "%9.0g" "%9.0g" "%9.0g" ...
- attr(*, "types")= int 5 254 254 254 254 254 254 254 254 251 ...
- attr(*, "val.labels")= chr "" "" "" "" ...
- attr(*, "var.labels")= chr "patient id" "0 distress" "0 pcs" "0 mcs" ...
- attr(*, "version")= int 12
- attr(*, "label.table")=List of 18
..$ wthdrw : Named num 0 1
.. ..- attr(*, "names")= chr "_" "Withdrawn"
..$ dth : Named num 0 1
.. ..- attr(*, "names")= chr "_" "Deceased"
..$ inhosp : Named num 0 1
.. ..- attr(*, "names")= chr "_" "Unwell/in hospital"
..$ noyes : Named num 0 1
.. ..- attr(*, "names")= chr "No" "Yes"
..$ hosp : Named num 0 1
.. ..- attr(*, "names")= chr "Sydney" "Other"
..$ TYPCANCE: Named num 1 2 3 4
.. ..- attr(*, "names")= chr "primary rectal" "recurrent rectal" "primary other" "recurrent other"
..$ TYPEEXEN: Named num 1 2 9
.. ..- attr(*, "names")= chr "curative exenteration" "palliative exenteration" "not applicable"
..$ GENDER : Named num 1 2
.. ..- attr(*, "names")= chr "male" "female"
..$ MARITALS: Named num 1 2 3 4
.. ..- attr(*, "names")= chr "single" "married / living with partner" "divorced" "widowed"
..$ EMPLOYME: Named num 1 2 3 4 5
.. ..- attr(*, "names")= chr "full time" "part time" "retired" "unemployed" ...
..$ LABB : Named num 1 2
.. ..- attr(*, "names")= chr "yes" "no"
..$ LABC : Named num 1 2
.. ..- attr(*, "names")= chr "yes" "no"
..$ CLEAROPM: Named num 1 2
.. ..- attr(*, "names")= chr "r0 / r1" "r2"
..$ RECADPRE: Named num 1 2
.. ..- attr(*, "names")= chr "recurrent" "advanced primary"
..$ RECADNEO: Named num 1 2 3
.. ..- attr(*, "names")= chr "yes, short course" "yes, long course" "no"
..$ V132_A : Named num 1 2
.. ..- attr(*, "names")= chr "yes" "no"
..$ RECADEXE: Named num 1 2 8
.. ..- attr(*, "names")= chr "curative" "palliative" "no surgery"
..$ OPASASCO: Named num 1 2 3 4 5
.. ..- attr(*, "names")= chr "healthy patient" "mild systemic disease - no functional limitation" "severe systemic disease - definite functional limitation" "severe systemic disease - constant threat to life" ...
Best Answer
I had already modified the predictorMatrix, but I went through it again and set it not to use a few more variables I was doubtful about, and the imputation ran, giving reasonable results.
Then I added another variable I needed imputed, and started getting the same problem.
With further work on the predictorMatrix (I stopped all the binary variables that had the single missing value being used for predicting each other), the imputation ran again, and the results look reasonable.
So I guess the error is related to collinearity, and specifically collinearity of the missingness. But why did it not always stop the first time it hits one of these variables? Sometimes I could get 3 or 4 imputations before the error came up, so there is something happening that is related to the random draws.