Solved – How to quickly identify participants responding randomly to self-report psychometric tests with many items

outlierspsychologypsychometrics

Many psychological studies involve getting participants to answer a hundred or more closed ended questions. A standard context would be a personality test with 100 items where each item is answered on a 1 to 5 scale. Items are designed to measure various scales and items vary in whether they are positively or negatively worded.

I often want to quickly identify participants who have answered the test randomly or in some other problematic way. I don't want to remove outliers in the purely statistical sense. For example, participants who are just very low or very high on the psychological scales might be flagged as extreme by some multivariate distance measures. I want to remove participants who have not completed the test conscientiously (e.g., random responding).

In the online environment, item response times can be very effective in identifying item skippers. However, assuming you only have item responses for a sample of participants:

  • What is a good basic procedure for flagging potential random responders?
  • Once such cases have been identified, what is a good strategy for determining whether they are random responders or just a bit unusual?
  • Are there any simple functions in R that implement the proposed approach?

Best Answer

Jeromy's own answer is more useful than mine, but since I am working on this myself, I have collected a few handy R code bits that can also be helpful or steer one in the right direction of detecting 'suspicious' responses.

The assumption here is that participants often (I find) respond in a pattern of some sort. Especially so in longer questionnaires.

For example, if participants choose the middle response category from top to bottom of a given scale, one you could check how many responses are unique in the scale (or even the entire survey) by using:

table(apply(data,1, function(X) {length(unique(X))}))

If you interested to know which cases in your data present this pattern, rather than the total per category, you can use this code:

 apply(data, 1, function(X) all(abs(diff(X)) == abs(1)))

Yet another way of looking at this is by at those participants whose responses alternate the response categories in diagonal lines (e.g., 1, 2, 3, 4, 5, 4, 3, 2, 1 or 2, 3, 4, 3, 2, 3, 4 which use just the center response categories.) For these respondents, one would need to calculate the lagged differences between responses with the following code:

apply(data, 1, function(X) all(abs(diff(X)) == abs(2)))