Solved – How to impute an ordinal variable with MICE but prevent it from taking one value

data-imputationmicemissing datamultiple-imputationr

I have an ordinal variable, overall_tumor_grade, that can take on values of 1, 2, 3, or X if the measurement is indeterminable. There are some NAs that I want to impute using the mice package in R, but I know that the missing values cannot be X because their tumor sizes are greater than 0. I want to impute overall_tumor_grade but force mice to only choose from 1, 2, or 3.

Here is sample code for you to use:

df=data.frame(age=c(24,37,58,65,70,84),
    overall_tumor_grade=c(1,1,2,3,'X',NA),
    tumor_size=c(1.5,2.0,4.2,5.6,0,0.1))
imp=mice(df)
na_index=which(is.na(df$overall_tumor_grade))
complete(imp)$overall_tumor_grade[na_index]  #This can never be 'X'

Thank you for your help and please let me know if you need more information.


New addition

@longrob suggested I temporarily remove the patients with X observations, impute, then add them back in to the full dataset. Would the imputation lose power by removing all of those observations? Along @longrob's suggestion, here is the work-around I have right now using a sample dataframe that is closer to what I'm really working with (several columns with missing values):

df=data.frame(age=rnorm(mean=45,sd=10,25),
overall_tumor_grade=factor(sample(c(1,2,3,'X'),25,replace=TRUE)),
tumor_size=runif(25)*10)
df[df$overall_tumor_grade=='X','tumor_size']=0 #Patients with Grade 'X' have tumorsz=0
df[sample(1:25,3),'age']=NA   ##Setting some observations to NA
df[sample(1:25,5),'overall_tumor_grade']=NA
df[sample(1:25,1),'tumor_size']=NA
################ Imputation
imp=mice(df,meth=c('pmm',"",'pmm'))  #Suppress imputation of overall_tumor_grade
dfimp=complete(imp)
dfimp2=dfimp[dfimp$overall_tumor_grade!='X',]  #I don't want to impute grade 'X' tumors
#so I am trying to remove those observations here and then use droplevels(), but for 
#a reason I can't figure out, that statement is setting all rows to NA where grade='X'

Any suggestions on how I accomplish what @longrob suggested?

Best Answer

The following code defines and calls a dedicated imputation function that separates imputation of cases with tumor_size == 0 from tumor_size > 0.

## How to impute an ordinal variable with MICE but prevent it from taking one value?

df <- data.frame(age = c(24,37,58,65,70,84),
                 overall_tumor_grade = c(1,1,2,3,'X',NA),
                 tumor_size = c(1.5,2.0,4.2,5.6,0,0.1))

mice.impute.tumor <- function(y, ry, x, ...){
    ymis <- y[!ry]
    tmis <- x$tumor_size[!ry] > 0
        t  <- x$tumor_size > 0
    y[!ry] <- NA
    ymis[!tmis] <- "X"
    ymis[tmis] <- mice.impute.polyreg(y[t, drop = TRUE], ry[t], x[t,], ...)
    ymis
}

ini <- mice(df, maxit = 0)
meth <- ini$meth
meth["overall_tumor_grade"] <- "tumor"
imp <- mice(df, meth = meth, maxit = 1, m = 2)
Related Question