Rubin's rules can only be applied to parameters following a normal distribution. For parameters with a F or Chi Square distribution a different set of formulas is needed:
- Allison, P. D. (2002). Missing data. Newbury Park, CA: Sage.
For performing an ANOVA on multiple imputed datasets you could use the R package miceadds (pdf; miceadds::mi.anova
).
Update 1
Here is a complete example:
Export your data from SPSS to R. In Spss save your dataset as .csv
Read in your dataset:
library(miceadds)
dat <– read.csv(file='your-dataset.csv')
Lets assume, that $reading$ is your dependent variable and that you have two factors
- gender, with male = 0 and female = 1
- treatment, with control = 0 and 'received treatment' = 1
Now lets convert them to factors:
dat$gender <- factor(dat$gender)
dat$treatment <- factor(dat$treatment)
Convert your dataset to a mids object, wehere we assume, that the first variable holds the imputation number (Imputation_ in SPSS):
dat.mids <- as.mids(dat)
Now you can perform an ANOVA:
fit <- mi.anova(mi.res=dat.mids, formula="reading~gender*treatment", type=3)
summary(fit)
Update 2 This is a reply to your second comment:
What you describe here is a data import/export related problem between SPSS and R. You could try to import the .sav
file directly into R and there are a bunch of dedicated packages for that: foreign
, rio
, gdata
, Hmisc
, etc. I prefer the csv-way, but that's a matter of taste and/or depends on the nature of your problem. Maybe you should also check some tutorials on youtube or other sources on the internet.
library(foreign)
dat <- read.spss(file='path-to-sav', use.value.labels=F, to.data.frame=T)
Update 3 This is a reply to your first comment:
Yes, you can do your analysis in SPSS and pool the F values in miceadds
(please note this example is taken from the miceadds::micombine.F
help page):
library(miceadds)
Fvalues <- c(6.76 , 4.54 , 4.23 , 5.45 , 4.78, 6.76 , 4.54 , 4.23 , 5.45 , 4.78,
6.76 , 4.54 , 4.23 , 5.45 , 4.78, 6.76 , 4.54 , 4.23 , 5.45 , 4.78 )
micombine(Fvalues, df1=4)
Best Answer
Yes, it is possible and, yes, there are
R
functions that do it. Instead of computing the p-values of the repeated analyses by hand, you can use the packageZelig
, which is also referred to in the vignette of theAmelia
-package (for a more informative method see my update below). I'll use an example from theAmelia
-vignette to demonstrate this:This is the corresponding output including $p$-values:
zelig
can fit a host of models other than least squares.To get confidence intervals and degrees of freedom for your estimates you can use
mitools
:This will give you confidence intervals and proportion of the total variance that is attributable to the missing data:
Of course you can just combine the interesting results into one object:
Update
After some playing around, I have found a more flexible way to get all necessary information using the
mice
-package. For this to work, you'll need to modify the package'sas.mids()
-function. Use Gerko's version posted in my follow-up question:With this defined, you can go on to analyze the imputed data sets:
This will give you all results you get using
Zelig
andmitools
and more:Note, using
pool()
you can also calculate $p$-values with $df$ adjusted for small samples by omitting themethod
-parameter. What is even better, you can now also calculate $R^2$ and compare nested models: