Multiple Regression – How to Pool F-Values in a Multiply Imputed Database

data-imputationmultiple regressionmultiple-imputation

When working with a dataset created via multiple imputation, SPSS pools some values but not others. For example, in multiple regression, I can get coefficients, t-tests for the coefficients, t-values and p-values for those t-tests. However, the ANOVA output testing model fit does not give me pooled data for the F-test and its p-value (nor pooled R2). What is the proper formula or procedure to calculate these values based on the information provided in the SPSS output?

Best Answer

This is an excerpt taken from "Applied Missing Data Analysis in R and SPSS" by Heymans & Eekhout (2019).

The pooling of Analysis of Variance (ANOVA) statistics is not available in SPSS. In Figure 5.8 the table is shown as a result of ANOVA after multiple imputation. It is clear from the Figure that the pooled results are lacking.

The next section provides advice for doing this in R.

The pooled ANOVA procedure uses the same function as the one to derive the pooled Chi-square value, because the Chi and the F-value are related. The easiest way to obtain a p-value for the ANOVA is by using the mi.anova function in the miceadds package. In this function a regression based formula can be defined to get a p-value.

I would recommend conducting this analysis with R, as you can obtain the results you wish. Here is a link to the text here which includes thorough examples.

Finally, The authors above use the miceadds package in R to combine F-statistics. The reference manual for the miceadds package says this regarding their method of combination:

This function (mi.anova) combines F values from analysis of variance using the D squared statistic which is based on combining Chi-Squared statistics (see Allison, 2001, Grund, Luedtke & Robitzsch, 2016)

Related Question