Solved – How to fix this error in mice multiple imputation

multiple-imputationr

I am trying to do multiple imputation using the mice package in R, and the imputation keeps stopping with the following error:

Error in mice.impute.logreg(c(1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,  : 
  dims [product 145] do not match the length of object [146]
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
2: In runif(length(d), 0, a1/10^10) : NAs produced
3: In runif(length(d), 0, a1/10^10) : NAs produced
4: In runif(length(d), 0, a1/10^10) : NAs produced
5: In runif(length(d), 0, a1/10^10) : NAs produced
6: In runif(length(d), 0, a1/10^10) : NAs produced
7: In runif(length(d), 0, a1/10^10) : NAs produced
8: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
9: In runif(length(d), 0, a1/10^10) : NAs produced
10: In runif(length(d), 0, a1/10^10) : NAs produced
11: In runif(length(d), 0, a1/10^10) : NAs produced
12: In runif(length(d), 0, a1/10^10) : NAs produced
13: In runif(length(d), 0, a1/10^10) : NAs produced
14: In runif(length(d), 0, a1/10^10) : NAs produced
15: In runif(length(d), 0, a1/10^10) : NAs produced
16: In runif(length(d), 0, a1/10^10) : NAs produced
17: In runif(length(d), 0, a1/10^10) : NAs produced
18: In runif(length(d), 0, a1/10^10) : NAs produced
19: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
20: In runif(length(d), 0, a1/10^10) : NAs produced
21: In runif(length(d), 0, a1/10^10) : NAs produced
22: In runif(length(d), 0, a1/10^10) : NAs produced
23: In runif(length(d), 0, a1/10^10) : NAs produced
24: In runif(length(d), 0, a1/10^10) : NAs produced
25: In runif(length(d), 0, a1/10^10) : NAs produced
26: In runif(length(d), 0, a1/10^10) : NAs produced
27: In runif(length(d), 0, a1/10^10) : NAs produced
28: In runif(length(d), 0, a1/10^10) : NAs produced
29: In runif(length(d), 0, a1/10^10) : NAs produced
30: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
31: In runif(length(d), 0, a1/10^10) : NAs produced
32: In runif(length(d), 0, a1/10^10) : NAs produced
33: In runif(length(d), 0, a1/10^10) : NAs produced
34: In runif(length(d), 0, a1/10^10) : NAs produced
35: In runif(length(d), 0, a1/10^10) : NAs produced
36: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
37: In runif(length(d), 0, a1/10^10) : NAs produced
38: In runif(length(d), 0, a1/10^10) : NAs produced
39: In runif(length(d), 0, a1/10^10) : NAs produced
40: In runif(length(d), 0, a1/10^10) : NAs produced
41: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
42: In runif(length(d), 0, a1/10^10) : NAs produced
43: In runif(length(d), 0, a1/10^10) : NAs produced
44: In runif(length(d), 0, a1/10^10) : NAs produced
45: In runif(length(d), 0, a1/10^10) : NAs produced
46: In rchisq(1, sum(ry) - ncol(x)) : NAs produced
47: In runif(length(d), 0, a1/10^10) : NAs produced
48: In runif(length(d), 0, a1/10^10) : NAs produced
49: In runif(length(d), 0, a1/10^10) : NAs produced
50: In runif(length(d), 0, a1/10^10) : NAs produced

If the seed is set to a particular value, the error occurs at the same point, otherwise the error will occur at different iterations, imputations and variables in different runs.

OK. So here is the data structure. This was imported from Stata using read.dta. The list of variables is truncated and does not show the ones that are showing the problem. They are all binary, with a single missing value each, in the same record.

> str(peimp.in)
'data.frame':   146 obs. of  164 variables:
 $ id                        : chr  "195" "128" "218" "1106" ...
     $ distress0                 : num  5 6 0 2 1 0 5 1 1 5 ...
 $ pcs0                      : num  47.1 46.9 60.8 44 61.5 ...
     $ mcs0                      : num  33.4 43.8 52.4 38.6 53.1 ...
 $ pwb0                      : num  24 21 28 17 27 NA 8 16 24 8 ...
     $ swb0                      : num  24 25 28 24 28 NA 17 28 28 24 ...
 $ ewb0                      : num  18 14 20 15 20 ...
     $ fwb0                      : num  14 21 28 18.7 26 ...
 $ ccs0                      : num  14 22 25 25 26 NA 16 9 23 14 ...
     $ pn.sf70                   : int  2 2 0 4 0 3 1 3 3 4 ...
 $ pn.sf80                   : int  2 1 0 2 0 2 3 3 1 3 ...
     $ pn.fcgp40                 : int  1 1 0 4 0 NA 1 3 3 3 ...
 $ pn.aqol0                  : int  1 1 0 1 0 1 0 1 1 2 ...
     $ pf.nbs0                   : num  51.8 55.6 55.6 40.3 57.5 ...
 $ rp.nbs0                   : num  21.2 45.9 54.9 34.7 57.2 ...
     $ bp.nbs0                   : num  42.6 46.7 62 34.6 62 ...
 $ gh.nbs0                   : num  54.6 28.5 66.5 61.7 62.7 ...
     $ vt.nbs0                   : num  49.6 49.6 70.4 40.7 58.5 ...
 $ sf.nbs0                   : num  37.3 47.3 37.3 42.3 57.3 ...
     $ re.nbs0                   : num  24.8 45.7 56.2 45.7 56.2 ...
 $ mh.nbs0                   : num  37.8 43 53.5 28 50.9 ...
     $ distress1                 : int  8 7 7 5 0 5 NA 8 0 2 ...
 $ pcs1                      : num  36.4 23.3 43.9 27.5 33.9 ...
     $ mcs1                      : num  32.1 32.2 45.2 32.5 51 ...
 $ pwb1                      : num  17 3 13 13 25 19 14 0 24 21 ...
     $ swb1                      : num  24 24.5 27 26.8 28 ...
 $ ewb1                      : num  17 13 4 18 18 22 20 17 24 22 ...
     $ fwb1                      : num  15 3.5 18.7 6 15 ...
 $ ccs1                      : num  13 16 21 11 24 ...
     $ pn.sf71                   : int  2 4 4 3 2 3 5 4 1 3 ...
 $ pn.sf81                   : int  3 4 3 4 1 3 3 4 2 3 ...
     $ pn.fcgp41                 : int  1 4 3 1 1 2 2 4 0 3 ...
 $ pn.aqol1                  : int  1 1 2 1 1 0 1 2 0 1 ...
     $ pf.nbs1                   : num  40.3 19.3 38.4 19.3 42.2 ...
 $ rp.nbs1                   : num  28 21.2 57.2 25.7 21.2 ...
     $ bp.nbs1                   : num  38.6 26.5 30.5 30.1 46.7 ...
 $ gh.nbs1                   : num  34.2 28.5 59.4 46 42.7 ...
     $ vt.nbs1                   : num  31.8 38.7 46.7 25.9 37.7 ...
 $ sf.nbs1                   : num  27.3 22.2 22.2 17.2 47.3 ...
     $ re.nbs1                   : num  31.8 21.4 56.2 31.8 49.2 ...
 $ mh.nbs1                   : num  37.8 32.6 44.3 35.2 48.2 ...
     $ distress2                 : int  7 7 NA 8 0 0 10 1 0 0 ...
 $ pcs2                      : num  24 29.9 NA 37.7 51.7 ...
     $ mcs2                      : num  35.6 32.5 NA 29.8 59.9 ...
 $ pwb2                      : num  12 9 NA 11 28 22 17 20 26 24 ...
     $ swb2                      : num  24 23.3 NA 8 28 ...
 $ ewb2                      : int  22 13 NA 12 19 24 10 18 22 23 ...
     $ fwb2                      : num  8 13 NA 2 28 10.5 7 4 22 18 ...
 $ ccs2                      : num  10 10 NA 12 26 21 14 12 27 23 ...
     $ pn.sf72                   : int  5 4 NA 2 0 3 3 3 1 1 ...
 $ pn.sf82                   : int  4 3 NA 2 0 NA 4 3 1 0 ...
     $ pn.fcgp42                 : int  4 4 NA 1 0 1 3 2 0 1 ...
 $ pn.aqol2                  : int  3 2 NA 1 0 1 1 1 0 0 ...
     $ pf.nbs2                   : num  21.2 28.8 NA 30.8 49.9 ...
 $ rp.nbs2                   : num  21.2 30.2 NA 30.2 54.9 ...
     $ bp.nbs2                   : num  21.7 30.5 NA 42.6 62 ...
 $ gh.nbs2                   : num  38 26.1 NA 38 46 ...
     $ vt.nbs2                   : num  43.7 37.7 NA 31.8 61.5 ...
 $ sf.nbs2                   : num  22.2 27.3 NA 32.3 57.3 ...
     $ re.nbs2                   : num  17.9 24.8 NA 31.8 56.2 ...
 $ mh.nbs2                   : num  40.4 35.2 NA 27.3 58.7 ...
     $ seegp2                    : Factor w/ 2 levels "No","Yes": NA 2 1 2 1 1 2 2 2 2 ...
 $ admed2                    : Factor w/ 2 levels "No","Yes": NA 1 2 1 1 1 1 1 1 1 ...
     $ admhsp2                   : Factor w/ 2 levels "No","Yes": NA 2 2 2 1 2 1 2 1 1 ...
 $ hlp.paid2                 : Factor w/ 2 levels "No","Yes": NA 1 1 1 1 1 2 1 1 1 ...
     $ hlp.unpaid2               : Factor w/ 2 levels "No","Yes": NA 1 1 1 1 1 1 1 1 1 ...
 $ hlp.ff2                   : Factor w/ 2 levels "No","Yes": NA 1 1 1 1 1 2 2 2 1 ...
     $ inhosp2                   : Factor w/ 2 levels "_","Unwell/in hospital": 1 1 1 1 1 1 1 1 1 1 ...
 $ wthdrw2                   : Factor w/ 2 levels "_","Withdrawn": 1 1 1 1 1 1 1 1 1 1 ...
     $ dth2                      : Factor w/ 2 levels "_","Deceased": 1 1 1 1 1 1 1 1 1 1 ...
 $ distress3                 : num  3 4 0 5 0 4 10 0 0 1 ...
     $ pcs3                      : num  42.1 24.9 46.9 32.3 55.3 ...
 $ mcs3                      : num  43.1 44.2 69.2 30.3 58.5 ...
     $ pwb3                      : num  17 10 28 10 28 ...
 $ swb3                      : num  18 21 24 16 28 ...
     $ ewb3                      : int  20 15 24 13 21 21 18 19 23 23 ...
 $ fwb3                      : num  14 11 28 7 25 NA 7 11 28 16 ...
     $ ccs3                      : num  20 13 28 10 25 ...
 $ pn.sf73                   : int  3 4 0 3 0 2 4 1 1 3 ...
     $ pn.sf83                   : int  2 4 0 3 0 0 3 0 0 2 ...
 $ pn.fcgp43                 : int  2 4 0 3 0 1 4 1 0 2 ...
     $ pn.aqol3                  : int  1 2 0 1 0 1 1 0 0 1 ...
 $ pf.nbs3                   : num  46.1 26.9 26.9 32.7 55.6 ...
     $ rp.nbs3                   : num  34.7 23.5 57.2 30.2 54.9 ...
 $ bp.nbs3                   : num  38.2 26.5 62 34.2 62 ...
     $ gh.nbs3                   : num  49.9 30.8 66.5 26.1 50.8 ...
 $ vt.nbs3                   : num  37.7 46.7 70.4 31.8 61.5 ...
     $ sf.nbs3                   : num  47.3 42.3 57.3 27.3 57.3 ...
 $ re.nbs3                   : num  35.3 28.3 56.2 31.8 56.2 ...
     $ mh.nbs3                   : num  48.2 40.4 64 29.9 58.7 ...
 $ seegp3                    : Factor w/ 2 levels "No","Yes": 1 NA 1 2 2 2 1 2 2 2 ...
     $ admed3                    : Factor w/ 2 levels "No","Yes": 1 NA 2 2 1 1 1 1 1 1 ...
 $ admhsp3                   : Factor w/ 2 levels "No","Yes": 2 NA 1 2 1 2 2 1 1 1 ...
     $ hlp.paid3                 : Factor w/ 2 levels "No","Yes": 1 NA 2 2 1 1 2 2 1 1 ...
 $ hlp.unpaid3               : Factor w/ 2 levels "No","Yes": 1 NA 1 1 1 1 1 1 1 1 ...
     $ hlp.ff3                   : Factor w/ 2 levels "No","Yes": 1 NA 2 1 1 1 2 2 1 1 ...
 $ inhosp3                   : Factor w/ 2 levels "_","Unwell/in hospital": 1 1 1 1 1 2 1 1 1 1 ...
     $ wthdrw3                   : Factor w/ 2 levels "_","Withdrawn": 1 1 1 1 1 1 1 1 1 1 ...
 $ dth3                      : Factor w/ 2 levels "_","Deceased": 1 1 1 1 1 1 1 1 1 1 ...
      [list output truncated]
     - attr(*, "datalabel")= chr "Dataset for imputation"
     - attr(*, "time.stamp")= chr "23 Jul 2012 09:19"
     - attr(*, "formats")= chr  "%5s" "%9.0g" "%9.0g" "%9.0g" ...
     - attr(*, "types")= int  5 254 254 254 254 254 254 254 254 251 ...
     - attr(*, "val.labels")= chr  "" "" "" "" ...
     - attr(*, "var.labels")= chr  "patient id" "0 distress" "0 pcs" "0 mcs" ...
     - attr(*, "version")= int 12
     - attr(*, "label.table")=List of 18
      ..$ wthdrw  : Named num  0 1
  .. ..- attr(*, "names")= chr  "_" "Withdrawn"
  ..$ dth     : Named num  0 1
      .. ..- attr(*, "names")= chr  "_" "Deceased"
      ..$ inhosp  : Named num  0 1
  .. ..- attr(*, "names")= chr  "_" "Unwell/in hospital"
  ..$ noyes   : Named num  0 1
      .. ..- attr(*, "names")= chr  "No" "Yes"
      ..$ hosp    : Named num  0 1
  .. ..- attr(*, "names")= chr  "Sydney" "Other"
  ..$ TYPCANCE: Named num  1 2 3 4
      .. ..- attr(*, "names")= chr  "primary rectal" "recurrent rectal" "primary other" "recurrent other"
      ..$ TYPEEXEN: Named num  1 2 9
  .. ..- attr(*, "names")= chr  "curative exenteration" "palliative exenteration" "not applicable"
  ..$ GENDER  : Named num  1 2
      .. ..- attr(*, "names")= chr  "male" "female"
      ..$ MARITALS: Named num  1 2 3 4
  .. ..- attr(*, "names")= chr  "single" "married / living with partner" "divorced" "widowed"
  ..$ EMPLOYME: Named num  1 2 3 4 5
      .. ..- attr(*, "names")= chr  "full time" "part time" "retired" "unemployed" ...
      ..$ LABB    : Named num  1 2
  .. ..- attr(*, "names")= chr  "yes" "no"
  ..$ LABC    : Named num  1 2
      .. ..- attr(*, "names")= chr  "yes" "no"
      ..$ CLEAROPM: Named num  1 2
  .. ..- attr(*, "names")= chr  "r0 / r1" "r2"
  ..$ RECADPRE: Named num  1 2
      .. ..- attr(*, "names")= chr  "recurrent" "advanced primary"
      ..$ RECADNEO: Named num  1 2 3
  .. ..- attr(*, "names")= chr  "yes, short course" "yes, long course" "no"
  ..$ V132_A  : Named num  1 2
      .. ..- attr(*, "names")= chr  "yes" "no"
      ..$ RECADEXE: Named num  1 2 8
  .. ..- attr(*, "names")= chr  "curative" "palliative" "no surgery"
  ..$ OPASASCO: Named num  1 2 3 4 5
  .. ..- attr(*, "names")= chr  "healthy patient" "mild systemic disease - no functional limitation" "severe systemic disease - definite functional limitation" "severe systemic disease - constant threat to life" ...

Best Answer

I had already modified the predictorMatrix, but I went through it again and set it not to use a few more variables I was doubtful about, and the imputation ran, giving reasonable results.

Then I added another variable I needed imputed, and started getting the same problem.

With further work on the predictorMatrix (I stopped all the binary variables that had the single missing value being used for predicting each other), the imputation ran again, and the results look reasonable.

So I guess the error is related to collinearity, and specifically collinearity of the missingness. But why did it not always stop the first time it hits one of these variables? Sometimes I could get 3 or 4 imputations before the error came up, so there is something happening that is related to the random draws.