Solved – Creating a Pooled Data Set From Multiple Imputation Output in SPSS

multiple-imputationpoolingspss

I have SPSS 26.

I have 3 datasets of survey responses, each representing different years' responses. As usual with survey data, there was a lot of missing data that I had to deal with before doing any analysis on. I used the built-in Multiple Imputation script and made 10 imputed datasets for each year that I've been able to perform all of my regression analysis just fine on, since the GLM process runs on all the individual imputed sets as well as a pooled set that contains the imputed sets. I essentially ran the same GLM model on the three different years to see if the selected independent variables consistently affect my dependent variable in the same manner over the three years selected.

However, I now want to analyze the changes in my dependent variable BETWEEN the years. To do that, I need to generate a single dataset that contains the pooled imputed survey data for each year.

  1. Is there any way to access and export the pooled dataset SPSS creates and uses when analyzing imputed data?
  2. If not, is there any way to create a dataset that pools the different imputed responses?
  3. Relatedly, does anyone know what method SPSS uses to pool the datasets? (i.e. if they just take the averages of the values assigned to each case, I could write a script that does that, but I'm unsure if this is what's going on in the background)

Best Answer

  1. There is no pooled dataset with multiple imputation in SPSS or any other software. Pooling is done on the results of the analyses for the separate completed datasets.

  2. You might do this by doing some averaging or something, but you'd be missing some of the value of multiple imputation (as you'd be eliminating between-imputation variability, which is integral to the methodology).

  3. As noted above, there's no pooling of datasets, only pooling of analysis results from different completed datasets. Pooling algorithms are given in the Multiple Imputation Pooling Algorithms chapter of the IBM SPSS Statistics Algorithms manual, which is available online (in the program, click Help>Documentation in PDF Format, select English or other desired language, then scroll down to the Manuals section and look for that title). The pooling of results is done using what are known as Rubin's rules. There's lots of information about those on the Internet.

To incorporate year into analyses, you'd probably want to go back to the original data with all three years in a single dataset, with cases for a given year properly identified, then re-impute data, with year included as an imputation variable. Then you could add year as a categorical predictor or factor in your GLM analyses.

Related Question