Solved – Multiple imputation on single subscale item or subscale scores

data-imputationmissing datascalesspssstructural-equation-modeling

Recently I am conducting a research on the relationship between motivation/attitude variables (Gardner's model) and English language proficiency in the Philippines. I encountered a problem: missing values. I used a 160-item scale in my study, consisting of around 10 subscales, where each item has a 7-point Likert-type response set, with values from 1 to 7. Some respondents failed to answer some items.

I'd like to try "Multiple Imputation" using SPSS 18. But I have some questions, hope you can help out:

  1. For example, the variable "Interest in foreign languages" is measured by a 10-item (Q1-Q10) scale, but some respondents left a few items unanswered. And again, "Attitudes toward English-speaking people" is measured by 8-item (e.g., Q11-Q18) scale. I wonder if I can impute missing values on a dataset with variable names such as, "ID, sex, age, Q1, Q2, Q3, Q4,…Q18, Final grade"? Or do I really have to add up the items first to get a subscale score before "Multiple Imputation"?

  2. Do I have to recode those negatively worded items before "Multiple Imputation"? For example, if Q1, Q3, Q5, Q7, Q9 are negatively worded, do I have to recode them first?

  3. It seems AMOS 18 cannot do "Calculate Estimates" on those imputed data. Do you think I should just average the five imputed values for each missing data to get a new value, from which I can build a new dataset so that AMOS 18 will have to handle only one complete dataset, rather than the five imputed datasets plus the original? Is averaging the five imputed values the right way of "POOLING"?

Best Answer

I basically concur with everything wolf.rauch said here, and would like to discuss some alternatives that might be available to you.

My understanding is that AMOS had had FIML (full information maximum likelihood) for continuous data for at least ten years before it was acquired by IBM -- see http://www.smallwaters.com/amos/faq/faqa-missdat.html, and that is an old FAQ by one of the original developers who left the project around 2000. If you are willing to ignore the ordinal nature of your items, you can just use this method, and won't bother figuring out multiple imputation.

If you don't like this solution, and you want to retain the categorical nature of the data, you would need to find the chained equations method with ordinal links (if SPSS has it at all). If SPSS only imputes draws from a multivariate normal distribution, then you are back to the situation of ignoring the ordinal nature of the data, and in no way better off than with AMOS' FIML. (I've no clue what's available in SPSS, you'd have to figure it out. In the end, everything would be fruitless if AMOS does not support multiple imputation -- and that, again, I don't know.)

If you are willing to consider Stata, there is a chance you'd be able to conduct your analysis in it, with all the bells and whistles of both multiple imputation for ordinal data using either Patrick Royston's ice or official mi, and then the new sem suite. Alternatively, you could run gllamm to obtain FIML estimates for ordinal data (although it would probably take eternity to converge).