When working with a dataset created via multiple imputation, SPSS pools some values but not others. For example, in multiple regression, I can get coefficients, t-tests for the coefficients, t-values and p-values for those t-tests. However, the ANOVA output testing model fit does not give me pooled data for the F-test and its p-value (nor pooled R2). What is the proper formula or procedure to calculate these values based on the information provided in the SPSS output?
Multiple Regression – How to Pool F-Values in a Multiply Imputed Database
data-imputationmultiple regressionmultiple-imputation
Related Solutions
You can do this somewhat by hand if by taking advantage of the lapply
functionality in R and the list-structure returned by the Amelia
multiple imputation package. Here's a quick example script.
library(Amelia)
library(lme4)
library(merTools)
library(plyr) # for collapsing estimates
Amelia
is similar to mice
so you can just substitute your variables in from the mice
call here -- this example is from a project I was working on.
a.out <- amelia(dat[sub1, varIndex], idvars = "SCH_ID",
noms = varIndex[!varIndex %in% c("SCH_ID", "math12")],
m = 10)
a.out
is the imputation object, now we need to run the model on each imputed dataset. To do this, we use the lapply
function in R to repeat a function over list elements. This function applies the function -- which is the model specification -- to each dataset (d) in the list and returns the results in a list of models.
mods <- lapply(a.out$imputations,
function(d) lmer((log(wage) ~ gender + age + age_sqr +
occupation + degree + private_sector + overtime +
(1+gender|faculty), data = d)
Now we create a data.frame from that list, by simulating the values the fixed and random effects using the functions FEsim and REsim from the merTools package
imputeFEs <- ldply(mods, FEsim, nsims = 1000)
imputeREs <- ldply(mods, REsim, nsims = 1000)
The data.frames above include separate estimates for each dataset, now we need to combine them using a collapse like argument collapse
imputeREs <- ddply(imputeREs, .(X1, X2), summarize, mean = mean(mean),
median = mean(median), sd = mean(sd),
level = level[1])
imputeFEs <- ddply(imputeFEs, .(var), summarize, meanEff = mean(meanEff),
medEff = mean(medEff), sdEff = mean(sdEff))
Now we can also extract some statistics on the variance/covariance for the random effects across the imputed values. Here I have written two simple extractor functions to do this.
REsdExtract <- function(model){
out <- unlist(lapply(VarCorr(model), attr, "stddev"))
return(out)
}
REcorrExtract <- function(model){
out <- unlist(lapply(VarCorr(model), attr, "corre"))
return(min(unique(out)))
}
And now we can apply them to the models and store them as a vector:
modStats <- cbind(ldply(mods, REsdExtract), ldply(mods, REcorrExtract))
Update
The functions below will get you much closer to the output provided by arm::display
by operating on the list of lmer
or glmer
objects. Hopefully this will be incorporated into the merTools
package in the near future:
# Functions to extract standard deviation of random effects from model
REsdExtract <- function(model){
out <- unlist(lapply(VarCorr(model), attr, "stddev"))
return(out)
}
#slope intercept correlation from model
REcorrExtract <- function(model){
out <- unlist(lapply(VarCorr(model), attr, "corre"))
return(min(unique(out)))
}
modelRandEffStats <- function(modList){
SDs <- ldply(modList, REsdExtract)
corrs <- ldply(modList, REcorrExtract)
tmp <- cbind(SDs, corrs)
names(tmp) <- c("Imp", "Int", "Slope", "id", "Corr")
out <- data.frame(IntSD_mean = mean(tmp$Int),
SlopeSD_mean = mean(tmp$Slope),
Corr_mean = mean(tmp$Corr),
IntSD_sd = sd(tmp$Int),
SlopeSD_sd = sd(tmp$Slope),
Corr_sd = sd(tmp$Corr))
return(out)
}
modelFixedEff <- function(modList){
require(broom)
fixEst <- ldply(modList, tidy, effects = "fixed")
# Collapse
out <- ddply(fixEst, .(term), summarize,
estimate = mean(estimate),
std.error = mean(std.error))
out$statistic <- out$estimate / out$std.error
return(out)
}
print.merModList <- function(modList, digits = 3){
len <- length(modList)
form <- modList[[1]]@call
print(form)
cat("\nFixed Effects:\n")
fedat <- modelFixedEff(modList)
dimnames(fedat)[[1]] <- fedat$term
pfround(fedat[-1, -1], digits)
cat("\nError Terms Random Effect Std. Devs\n")
cat("and covariances:\n")
cat("\n")
ngrps <- length(VarCorr(modmathG[[1]]))
errorList <- vector(mode = 'list', length = ngrps)
corrList <- vector(mode = 'list', length = ngrps)
for(i in 1:ngrps){
subList <- lapply(modList, function(x) VarCorr(x)[[i]])
subList <- apply(simplify2array(subList), 1:2, mean)
errorList[[i]] <- subList
subList <- lapply(modList, function(x) attr(VarCorr(x)[[i]], "corre"))
subList <- min(unique(apply(simplify2array(subList), 1:2, function(x) mean(x))))
corrList[[i]] <- subList
}
errorList <- lapply(errorList, function(x) {
diag(x) <- sqrt(diag(x))
return(x)
})
lapply(errorList, pfround, digits)
cat("\nError Term Correlations:\n")
lapply(corrList, pfround, digits)
residError <- mean(unlist(lapply(modList, function(x) attr(VarCorr(x), "sc"))))
cat("\nResidual Error =", fround(residError,
digits), "\n")
cat("\n---Groups\n")
ngrps <- lapply(modList[[1]]@flist, function(x) length(levels(x)))
modn <- getME(modList[[1]], "devcomp")$dims["n"]
cat(sprintf("number of obs: %d, groups: ", modn))
cat(paste(paste(names(ngrps), ngrps, sep = ", "),
collapse = "; "))
cat("\n")
cat("\nModel Fit Stats")
mAIC <- mean(unlist(lapply(modList, AIC)))
cat(sprintf("\nAIC = %g", round(mAIC, 1)))
moDsigma.hat <- mean(unlist(lapply(modmathG, sigma)))
cat("\nOverdispersion parameter =", fround(moDsigma.hat,
digits), "\n")
}
Rubin's rules can only be applied to parameters following a normal distribution. For parameters with a F or Chi Square distribution a different set of formulas is needed:
- Allison, P. D. (2002). Missing data. Newbury Park, CA: Sage.
For performing an ANOVA on multiple imputed datasets you could use the R package miceadds (pdf; miceadds::mi.anova
).
Update 1
Here is a complete example:
Export your data from SPSS to R. In Spss save your dataset as .csv
Read in your dataset:
library(miceadds) dat <– read.csv(file='your-dataset.csv')
Lets assume, that $reading$ is your dependent variable and that you have two factors
- gender, with male = 0 and female = 1
- treatment, with control = 0 and 'received treatment' = 1
Now lets convert them to factors:
dat$gender <- factor(dat$gender) dat$treatment <- factor(dat$treatment)
Convert your dataset to a mids object, wehere we assume, that the first variable holds the imputation number (Imputation_ in SPSS):
dat.mids <- as.mids(dat)
Now you can perform an ANOVA:
fit <- mi.anova(mi.res=dat.mids, formula="reading~gender*treatment", type=3) summary(fit)
Update 2 This is a reply to your second comment:
What you describe here is a data import/export related problem between SPSS and R. You could try to import the .sav
file directly into R and there are a bunch of dedicated packages for that: foreign
, rio
, gdata
, Hmisc
, etc. I prefer the csv-way, but that's a matter of taste and/or depends on the nature of your problem. Maybe you should also check some tutorials on youtube or other sources on the internet.
library(foreign)
dat <- read.spss(file='path-to-sav', use.value.labels=F, to.data.frame=T)
Update 3 This is a reply to your first comment:
Yes, you can do your analysis in SPSS and pool the F values in miceadds
(please note this example is taken from the miceadds::micombine.F
help page):
library(miceadds)
Fvalues <- c(6.76 , 4.54 , 4.23 , 5.45 , 4.78, 6.76 , 4.54 , 4.23 , 5.45 , 4.78,
6.76 , 4.54 , 4.23 , 5.45 , 4.78, 6.76 , 4.54 , 4.23 , 5.45 , 4.78 )
micombine(Fvalues, df1=4)
Best Answer
This is an excerpt taken from "Applied Missing Data Analysis in R and SPSS" by Heymans & Eekhout (2019).
The next section provides advice for doing this in R.
I would recommend conducting this analysis with R, as you can obtain the results you wish. Here is a link to the text here which includes thorough examples.
Finally, The authors above use the miceadds package in R to combine F-statistics. The reference manual for the miceadds package says this regarding their method of combination: