Pooled typically refers to a "weighted" average. If you have two samples and estimates of each samples variance is $s_1^2$ and $s_2^2$ you might consider the pooled estimate:
$$
s^2 = \dfrac{ (n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}$$
Note this is not a simple average which would be $$\dfrac{s_1^2+s_2^2}{2}$$
The idea is that each sample might be based on a different sample size and you want to account for that in your estimate (the estimate that comes from the larger sample size should have more of an impact on your final estimate than the estimate from the smaller sample size).
Rubin's rules can only be applied to parameters following a normal distribution. For parameters with a F or Chi Square distribution a different set of formulas is needed:
- Allison, P. D. (2002). Missing data. Newbury Park, CA: Sage.
For performing an ANOVA on multiple imputed datasets you could use the R package miceadds (pdf; miceadds::mi.anova
).
Update 1
Here is a complete example:
Export your data from SPSS to R. In Spss save your dataset as .csv
Read in your dataset:
library(miceadds)
dat <– read.csv(file='your-dataset.csv')
Lets assume, that $reading$ is your dependent variable and that you have two factors
- gender, with male = 0 and female = 1
- treatment, with control = 0 and 'received treatment' = 1
Now lets convert them to factors:
dat$gender <- factor(dat$gender)
dat$treatment <- factor(dat$treatment)
Convert your dataset to a mids object, wehere we assume, that the first variable holds the imputation number (Imputation_ in SPSS):
dat.mids <- as.mids(dat)
Now you can perform an ANOVA:
fit <- mi.anova(mi.res=dat.mids, formula="reading~gender*treatment", type=3)
summary(fit)
Update 2 This is a reply to your second comment:
What you describe here is a data import/export related problem between SPSS and R. You could try to import the .sav
file directly into R and there are a bunch of dedicated packages for that: foreign
, rio
, gdata
, Hmisc
, etc. I prefer the csv-way, but that's a matter of taste and/or depends on the nature of your problem. Maybe you should also check some tutorials on youtube or other sources on the internet.
library(foreign)
dat <- read.spss(file='path-to-sav', use.value.labels=F, to.data.frame=T)
Update 3 This is a reply to your first comment:
Yes, you can do your analysis in SPSS and pool the F values in miceadds
(please note this example is taken from the miceadds::micombine.F
help page):
library(miceadds)
Fvalues <- c(6.76 , 4.54 , 4.23 , 5.45 , 4.78, 6.76 , 4.54 , 4.23 , 5.45 , 4.78,
6.76 , 4.54 , 4.23 , 5.45 , 4.78, 6.76 , 4.54 , 4.23 , 5.45 , 4.78 )
micombine(Fvalues, df1=4)
Best Answer
What you write might come close in some circumstances but can't be counted on in general. For example, putting multiple imputation aside for a moment, an average of hazard-ratio confidence intervals from Cox survival regression models among bootstrapped samples from a complete data set will tend to be very poorly behaved.
For multiple imputation, Section 2.3 of Stef van Buuren's Flexible Imputation of Missing Data explains that Rubin's Rules take not only within-imputation and between-imputation variances into account but also a further variance due to a finite number of imputations. The variance of an averaged statistic $\bar Q$ among $m$ imputations thus has three sources:
What you write seems to be most closely related to the variance contributed by $\bar U$, although it might include some contribution from $B$ insofar as the mean-value estimates change among imputation sets and thus shift the CI. It doesn't seem, however, to include the extra variance due to a finite value of $m$. If you had a large number $m$ of imputations that might not be a big problem.
So it's safest to follow Rubin's Rules and stick with the pooled SE.