Solved – Applying Rubin’s rule for combining multiply imputed datasets

missing datamultiple-imputationpoolingspss

I am hoping to pool the results of a pretty basic set of analysis performed on a multiply imputed data (e.g. multiple regression, ANOVA). Multiple imputation and the analyses have been completed in SPSS but SPSS does not provide pooled results for a few statistics including the F-value, covariance matrix, R-Squared etc.

I have made a few attempts to address this issue by venturing into R or trying macros that are available and have not successfully resolved the problem (e.g. with running into issues with pooling the stats for more than 5 imputations in Mice, for example).

At this point, I would like to try computing these by hand, applying Rubin's rule, using the output that SPSS generates. However, I am not sure how I can derive the within-imputation variance ($\bar U = \frac 1 m\sum_{j=1}^mU_j$) based on the output SPSS generates.

I would really appreciate a detailed instruction on this.

Best Answer

Rubin's rules can only be applied to parameters following a normal distribution. For parameters with a F or Chi Square distribution a different set of formulas is needed:

  • Allison, P. D. (2002). Missing data. Newbury Park, CA: Sage.

For performing an ANOVA on multiple imputed datasets you could use the R package miceadds (pdf; miceadds::mi.anova).

Update 1

Here is a complete example:

  1. Export your data from SPSS to R. In Spss save your dataset as .csv

  2. Read in your dataset:

    library(miceadds)   
    dat <– read.csv(file='your-dataset.csv')
    

    Lets assume, that $reading$ is your dependent variable and that you have two factors

    • gender, with male = 0 and female = 1
    • treatment, with control = 0 and 'received treatment' = 1

    Now lets convert them to factors:

    dat$gender    <- factor(dat$gender)
    dat$treatment <- factor(dat$treatment)
    
  3. Convert your dataset to a mids object, wehere we assume, that the first variable holds the imputation number (Imputation_ in SPSS):

    dat.mids <- as.mids(dat)
    
  4. Now you can perform an ANOVA:

    fit <- mi.anova(mi.res=dat.mids, formula="reading~gender*treatment", type=3)
    summary(fit)
    

Update 2 This is a reply to your second comment:

What you describe here is a data import/export related problem between SPSS and R. You could try to import the .sav file directly into R and there are a bunch of dedicated packages for that: foreign, rio, gdata, Hmisc, etc. I prefer the csv-way, but that's a matter of taste and/or depends on the nature of your problem. Maybe you should also check some tutorials on youtube or other sources on the internet.

library(foreign)
dat <- read.spss(file='path-to-sav', use.value.labels=F, to.data.frame=T)

Update 3 This is a reply to your first comment:

Yes, you can do your analysis in SPSS and pool the F values in miceadds (please note this example is taken from the miceadds::micombine.F help page):

library(miceadds)
Fvalues <- c(6.76 , 4.54 , 4.23 , 5.45 , 4.78, 6.76 , 4.54 , 4.23 , 5.45 , 4.78, 
             6.76 , 4.54 , 4.23 , 5.45 , 4.78, 6.76 , 4.54 , 4.23 , 5.45 , 4.78 )
micombine(Fvalues, df1=4)
Related Question