Solved – Randomized Response, calculating probability

data transformationmathematical-statisticsprobability-inequalitiesproportion;survey

I have a problem with randomized response method when calculating the probability of critical attribute. For any of you who are not familiar with the method, you can check it here, Wikipedia Entry for Randomized Response.

Briefly, the method encourages people being surveyed on a critical question to answer honestly. To achieve this, there are two questions whose answers are opposite to each other and respondents choose the first or second question with a probability of p and 1-p, respectively. The interviewer does not know which question is answered by the respondent, so the privacy of the respondent is preserved. As far as I know, this method is first studied by Warner.

As you can imagine, the proportion of "yes" answers from the survey is composed (used conventions from Wikipedia)

YA = p*EP + (1-p)*(1-EP)
EP= (YA + p - 1) / (2*p-1)

YA: yes answers from survey,
EP: proportion of true yes answers
p: probability of answering critical questions

In my tests, for some cases the proportion of true yes answers is greater than 1.

Assume that, I get 55 yes answers out of 160 survey respondents and p is 0.4.

p=0.4
YA=0.34375 (55 out of 160)
EP= (YA + p - 1) / (2*p-1)
EP=(0.34375 + 0.4 - 1) /(2*0.4 - 1)
EP=(-0.25625)/(-0.2)=1.28125

Here, EP is greater than 1 which is not reasonable for a survey result because it means that more than survey respondents having such an attribute?(Am I wrong?). Is this result possible in this technique? Am I missing something? I wonder if anyone has any experience in this method and or any idea? How can I apply this method in order to get reasonable proportions for EP (0<=EP<=1)? I am totally confused about this method although it seem fairly reasonable to me.

I will appreciate your comments and suggestions.

Best Answer

You are seeing the problem with the "method-of-moments" estimation: the estimate is unbiased, but is not restricted to the required parameter space. The Appendix of "Randomized Response: Theory and Techniques" by A. Chaudhuri, R. Mukerjee (can be found on Google Books) gives a corrected formula using the maximum likelihood estimate, which essentially sets the estimate to 0 or 1 depending on the exact situation (whichever gives the larger likelihood). This estimate is not unbiased, but most people would consider it more reasonable. In your case, I think the estimate would give $1$.

By the way, I think your main problem is the very high value of $p$. You should get it as far from $0.5$ as possible while still convincing people that you don't know which question they answered.

Related Question