Dataset – How to Clean Up Inconsistent Survey Data Effectively

data preprocessingdatasetsurvey

I have some survey response data that are inconsistent and I'm not sure what is the best way to handle them. This is the nature of the inconsistency:

  1. Have you ever had a food poisoning episode that you think might have been caused by the food in a particular restaurant ? a) Yes b) No
  2. Did any of these episodes prevent you from eating at that restaurant again?
    a) Does not apply, never have had a food poisoning episode b) Yes c) No
  3. Following these episodes, did you come back to your normal condition after 24 hours?
    a) Does not apply, never have had a food poisoning episode b) Yes c) No

I believe that the intent of the designers of the survey was that anyone answering "No" to 1) should answer "Does not apply" to 2) and 3), but that's not how it worked. A large group of respondents picked "No" for 1) and "No" for at least one of 2) or 3).

What is the best way to deal with this inconsistency?

Where are some good references (books, papers, websites, etc.) to look at discussions and solutions for issues of this kind, and data cleaning in general?

Best Answer

data cleaning of surveys takes longer than analysis and report write-up, so you're not alone. :)

Normally in a survey, we path the questions for respondents. So, for example, in computer-assisted telephone interviewing (or online interviews, face-to-face interviewing with a laptop), the survey programmers code the survey to literally skip questions that should not be presented if the respondent answers a particular way.

It appears that a question skip pattern was missing from this survey, for whatever reason. If a skip pattern should have been implemented, then yes you can post-hoc introduce it for questions 2 and 3 and change the "should not have answered" responses to system-missing (or other missing code you're using).

There are a lot of survey books out there, and the ones for you will really depend on your particular need as they all have various strengths and weaknesses. Have a look at the range of books by David De Vaus, such as Analysing Social Science Data - this looks particularly good for your situation. David De Vaus has written a number of other social science survey books, and they all come recommended. The Dillman et al book also came highly recommended to me, although I have not used it myself.

I also recommend cognitive testing followed by field testing of a questionnnaire before going live with the survey. This type of testing is designed to show up question sequencing issues, while also showing how respondents interpret questions (this is sometimes not the same way as intended by the questionnaire designer!). While this process is too late for your current survey, you can implement it for future surveys.

Best wishes with your survey analysis.

Related Question