Solved – Dealing with ‘Don’t Know’ answers for a categorical outcome variable

categorical datalogitmultiple-imputationmultivariate analysissurvey

I have a survey data with categorical outcome variable (yes, no, don't know) which reflects the acceptance of some situation by respondents. My concern is how to deal with Don't know answers, I really doubt I should drop these observations, because:

  1. it shrinks my dataset from around 14400 to around 13000, which is considerable;
  2. I have intuition that DK answer carry some info and thus not random.

So my questions are:

  1. One suggested that non-randomness influences the estimated probability and I should check for it, but how do we check for randomness in Stata?
  2. If keeping DK answers is desired then multiple imputation (for example) is the way to deal with this issue. Is there any source/links that I could use to make myself familiar with what multiple imputation is and how it is done in Stata?
  3. Almost all papers I read on my topic use logistic regression, I wonder what is the justification behind it. Is there any links/source that compare different probabilistic approaches for not-binary outcome variable (in my case it will be three-asnwers categorical outcome variable) and how we choose between them?

Best Answer

If you think that "DK" answers give information, then you don't want to treat them as missing and therefore don't want to multiply impute them.

Multinomial logistic regression is, as you say, the usual method for nominal outcome variables; it works well; what other methods did you have in mind? One other possibility is ordinal logistic, which would assume that "DK" is somewhere between "Yes" and "No"; the reasonableness of this assumption depends on your exact situation. Another possibility is multinomial probit - I have never seen this used. In the binomial case, the results of probit and logistic regression are often quite similar.

Related Question