Yes, it is possible and, yes, there are R
functions that do it. Instead of computing the p-values of the repeated analyses by hand, you can use the package Zelig
, which is also referred to in the vignette of the Amelia
-package (for a more informative method see my update below). I'll use an example from the Amelia
-vignette to demonstrate this:
library("Amelia")
data(freetrade)
amelia.out <- amelia(freetrade, m = 15, ts = "year", cs = "country")
library("Zelig")
zelig.fit <- zelig(tariff ~ pop + gdp.pc + year + polity, data = amelia.out$imputations, model = "ls", cite = FALSE)
summary(zelig.fit)
This is the corresponding output including $p$-values:
Model: ls
Number of multiply imputed data sets: 15
Combined results:
Call:
lm(formula = formula, weights = weights, model = F, data = data)
Coefficients:
Value Std. Error t-stat p-value
(Intercept) 3.18e+03 7.22e+02 4.41 6.20e-05
pop 3.13e-08 5.59e-09 5.59 4.21e-08
gdp.pc -2.11e-03 5.53e-04 -3.81 1.64e-04
year -1.58e+00 3.63e-01 -4.37 7.11e-05
polity 5.52e-01 3.16e-01 1.75 8.41e-02
For combined results from datasets i to j, use summary(x, subset = i:j).
For separate results, use print(summary(x), subset = i:j).
zelig
can fit a host of models other than least squares.
To get confidence intervals and degrees of freedom for your estimates you can use mitools
:
library("mitools")
imp.data <- imputationList(amelia.out$imputations)
mitools.fit <- MIcombine(with(imp.data, lm(tariff ~ polity + pop + gdp.pc + year)))
mitools.res <- summary(mitools.fit)
mitools.res <- cbind(mitools.res, df = mitools.fit$df)
mitools.res
This will give you confidence intervals and proportion of the total variance that is attributable to the missing data:
results se (lower upper) missInfo df
(Intercept) 3.18e+03 7.22e+02 1.73e+03 4.63e+03 57 % 45.9
pop 3.13e-08 5.59e-09 2.03e-08 4.23e-08 19 % 392.1
gdp.pc -2.11e-03 5.53e-04 -3.20e-03 -1.02e-03 21 % 329.4
year -1.58e+00 3.63e-01 -2.31e+00 -8.54e-01 57 % 45.9
polity 5.52e-01 3.16e-01 -7.58e-02 1.18e+00 41 % 90.8
Of course you can just combine the interesting results into one object:
combined.results <- merge(mitools.res, zelig.res$coefficients[, c("t-stat", "p-value")], by = "row.names", all.x = TRUE)
Update
After some playing around, I have found a more flexible way to get all necessary information using the mice
-package. For this to work, you'll need to modify the package's as.mids()
-function. Use Gerko's version posted in my follow-up question:
as.mids2 <- function(data2, .imp=1, .id=2){
ini <- mice(data2[data2[, .imp] == 0, -c(.imp, .id)], m = max(as.numeric(data2[, .imp])), maxit=0)
names <- names(ini$imp)
if (!is.null(.id)){
rownames(ini$data) <- data2[data2[, .imp] == 0, .id]
}
for (i in 1:length(names)){
for(m in 1:(max(as.numeric(data2[, .imp])))){
if(!is.null(ini$imp[[i]])){
indic <- data2[, .imp] == m & is.na(data2[data2[, .imp]==0, names[i]])
ini$imp[[names[i]]][m] <- data2[indic, names[i]]
}
}
}
return(ini)
}
With this defined, you can go on to analyze the imputed data sets:
library("mice")
imp.data <- do.call("rbind", amelia.out$imputations)
imp.data <- rbind(freetrade, imp.data)
imp.data$.imp <- as.numeric(rep(c(0:15), each = nrow(freetrade)))
mice.data <- as.mids2(imp.data, .imp = ncol(imp.data), .id = NULL)
mice.fit <- with(mice.data, lm(tariff ~ polity + pop + gdp.pc + year))
mice.res <- summary(pool(mice.fit, method = "rubin1987"))
This will give you all results you get using Zelig
and mitools
and more:
est se t df Pr(>|t|) lo 95 hi 95 nmis fmi lambda
(Intercept) 3.18e+03 7.22e+02 4.41 45.9 6.20e-05 1.73e+03 4.63e+03 NA 0.571 0.552
pop 3.13e-08 5.59e-09 5.59 392.1 4.21e-08 2.03e-08 4.23e-08 0 0.193 0.189
gdp.pc -2.11e-03 5.53e-04 -3.81 329.4 1.64e-04 -3.20e-03 -1.02e-03 0 0.211 0.206
year -1.58e+00 3.63e-01 -4.37 45.9 7.11e-05 -2.31e+00 -8.54e-01 0 0.570 0.552
polity 5.52e-01 3.16e-01 1.75 90.8 8.41e-02 -7.58e-02 1.18e+00 2 0.406 0.393
Note, using pool()
you can also calculate $p$-values with $df$ adjusted for small samples by omitting the method
-parameter. What is even better, you can now also calculate $R^2$ and compare nested models:
pool.r.squared(mice.fit)
mice.fit2 <- with(mice.data, lm(tariff ~ polity + pop + gdp.pc))
pool.compare(mice.fit, mice.fit2, method = "Wald")$pvalue
Best Answer
It seems that you want to stack the imputed datasets. As noted by those who have commented previously, this is not the best way to analyse the data (point estimates tend to be accurate, but the variability accounted for by the imputation process is no longer present and error will be reduced). Nevertheless, stacking the data is achieved by using the
complete
function in themice
package. Once stacked, the data can be exported easily to other software programs.It is also possible to export the
mids
object (imp
) directly to SPSS (if that is your other software) using themids2spss
function inmice
.