Solved – detecting plagiarism on multiple-choice test

correlationterminology

Suppose an invigilator suspects one student of copying answers off another student's paper during a multiple-choice exam. She later checks their answers and finds some similarities—but on the other hand, there are bound to be similarities given the nature of the exam. How should she go about determining whether her suspicions were founded?

In other words, she will surely have to compare the exams to those of other students (who, let us assume, were not cheating). But if the class size is very large, is it reasonable to take a random sampling for comparison? How many would she then take? If there were many questions on the exam, would it also be reasonable to take a sampling of the questions for comparison? Does it make a significant difference whether each question had 2 possible answers (true/false) or, say, 4?

I don't have any specific numbers because I'm wondering about how this would work in general. I have a background in mathematics but little training in statistics. How would you describe this analysis in statistical terms?

Thank you.

Best Answer

Here's a surprisingly vast array of the answer copying indexes, with little discussion of their merits though: http://www.bjournal.co.uk/paper/BJASS_01_01_06.pdf.

There's a field of (educational) psychology called item response theory (IRT) that provides the statistical background for questions like these. If you an American, and took an SAT, ACT or GRE, you dealt with a test developed with IRT in mind. The basic postulate of IRT is that each student $i$ is characterized by their ability $a_i$; each question is characterized by its difficulty $b_j$; and the probability to answer a question correctly is $$ \pi(a_i,b_j;c) = {\rm Prob}[\mbox{student $i$ answers question $j$ correctly}] = \Phi( c(a_i-b_j) ) $$ where $\Phi(z)$ is the cdf of the standard normal, and $c$ is an additional sensitivity/discrimination parameter (sometimes, it is made question-specific, $c_j$, if there's enough information, i.e., enough test takers, to identify the differences). A hidden assumption here that given the students ability $i$, answers to different questions are independent. This assumption is violated if you have a battery of questions about say the same paragraph of text, but let's abstract from it for a minute.

For "Yes/No" questions, this may be the end of the story. For more than two category questions, we can make an additional assumption that all wrong choices are equally likely; for a question $j$ with $k_j$ choices, probability of each wrong choice is $\pi'(a_i,b_j;c) = [1-\pi(a_i,b_j;c)]/(k_j-1)$.

For students of abilities $a_i$ and $a_k$, the probability that they match on their answers for a question with difficulty $b_j$ is $$ \psi(a_i,a_k;b_j,c) = \pi(a_i,b_j;c)\pi(a_k,b_j;c) + (k-1)\pi'(a_i,b_j;c)\pi'(a_k,b_j;c) $$ If you like, you can break this into probability of matching on the correct answer, $\psi_c(a_i,a_k;b_j,c) = \pi(a_i,b_j;c)\pi(a_k,b_j;c)$, and the probability of matching on an incorrect answer, $\psi_i(a_i,a_k;b_j,c) = (k-1)\pi'(a_i,b_j;c)\pi'(a_k,b_j;c)$, although from the conceptual framework of IRT, this distinction is hardly material.

Now, you can compute the probability of matching, but it will probably be combinatorially minuscule. A better measure may be the ratio of the information in the pairwise pattern of responses, $$ I(i,k) = \sum_j 1\{ \mbox{match}_j \} \ln \psi(a_i,a_k;b_j,c) + 1\{ \mbox{non-match}_j \} \ln [1- \psi(a_i,a_k;b_j,c) ] $$ and relate it to the entropy $$ E(i,k) = {\rm E}[ I(i,k) ] = \sum_j \psi(a_i,a_k;b_j,c) \ln \psi(a_i,a_k;b_j,c) + (1- \psi(a_i,a_k;b_j,c) ) \ln [1- \psi(a_i,a_k;b_j,c) ] $$ You can do this for all pairs of students, plot them or rank them, and investigate the greatest ratios of information to entropy.

The parameters of the test $\{c,b_j, j=1, 2, \ldots\}$ and student abilities $\{a_i\}$ won't fall out of blue sky, but they are easily estimable in modern software such as R with lme4 or similar packages:

    irt <- glmer( answer ~ 1 + (1|student) + (1|question), family = binomial)

or something very close to this.

Related Question