Probability – Calculating Cheating Probability in Bernoulli Trials

classificationdistributionshypothesis testingprobabilitystatistical significance

I asked players to play a game where the outcome (number of successes) should be based on pure chance. However, players could cheat and increase their number of successes. Cheating is not actually a bad thing in my game, I am just using this word for ease of exposition. For each number of successes, I want to calculate the probability that the players with that number of successes cheated.

I will explain the game. Players chose 1 of 10 numbers for 10 rounds. Every round, only 1 of the 10 numbers means that a player is successful. So in each of the 10 rounds, a player has a 10% probability to be successful. Using Bernoulli's Trial I computed the expected probability of the total number of rounds in which a player will be successful. I post them in the table below.

In the data from the game, I can see that many players manage to be successful in all 10 rounds. Let's say that 20 out of 1,000 (2%) of players succeeded in all 10 rounds although, in expectation, only 0.0000001 players (1000 players * the probability that one player succeeds in all 10 rounds) should have succeeded in all 10 rounds. Can I calculate the probability that 20 out of 1,000 players succeeding in 10 rounds is due to cheating and not due to chance?

I want to calculate this number for each level of success. So, for example, 3 out of 1,000 (0.03%) players succeeded in 5 rounds although, in expectation, only 1.4880348 players (1000 players * the probability that one player succeeds in 5 rounds) should have succeeded in 5 rounds. So in general I want to know, for an actual number of players that succeeded for x rounds, what is the probability that their level of success is due to cheating and not due to chance?

I think I should not use a Binomial Test as presented here (https://www.randomservices.org/random/hypothesis/Bernoulli.html). This test, as I understand it, checks if the mean number of wins across all players is different than the expected mean number of wins.

Any help would be appreciated. Thank you.

Best Answer

I'm not sure I understand the description of the game, but the table clearly shows the PDF of $\mathsf{Binom}(10, 0.1).$

Then you ask "Can I calculate the probability that 20 out of 1,000 players succeeding in 10 rounds is due to cheating and not due to chance?"

It seems as if you want to test $H_0: p = (0.1)^{10}$ against $H_0: p > (0.1)^{10},$ at the 5% level based on results from $n = 1000$ plays of the game in which $x$ players got a Success (the same number all ten times).

In view of the extreme skewness of this binomial distribution, I think you should use an exact binomial test to get the P-value for such a test, rather than an approximate normal test (as in your link). In R binom.test with only $x = 1$ claim of Success, I get the result below. The P-value is very nearly $0.$

binom.test(1, 1000, p=(0.1)^10, alt = "g")

        Exact binomial test

data:  1 and 1000
number of successes = 1, number of trials = 1000, 
 p-value = 1e-07
alternative hypothesis: 
 true probability of success is not equal to 1e-10
95 percent confidence interval:
 2.531749e-05 5.558924e-03
 sample estimates:
probability of success 
                 0.001

A direct computation of this P-value is in agreement, except for rounding of a number very near $0.$ So even one claim of Success would be strong evidence of 'cheating'. [If more than $x = 1$ of the 1000 participants claimed a Success, the P-value would be even smaller.]

1-pbinom(0, 1000, .1^10)
[1] 9.999999e-08    # aprx 1e-07

Note: An analogous, but 'milder', game would be to roll a fair die five times and to get a 'Success' if all five rolls showed the same face.

dbinom(5, 5, 1/6)
[1] 0.0001286008
(1/6)^5
[1] 0.0001286008

binom.test(1, 100, p=(1/6)^5, alt = "g")

        Exact binomial test

data:  1 and 100
number of successes = 1, number of trials = 100, 
 p-value = 0.01278
alternative hypothesis: 
 true probability of success is greater than 0.0001286008
95 percent confidence interval:
  0.0005128014 1.0000000000
sample estimates:
 probability of success 
                   0.01

For this game only one claim of 'Success' in 100 would raise suspicions of 'cheating' at the 2% level but not quite at the 1% level.

Remarks

Convolution of the two binomial distributions (using the Fast Fourier Transform) is an attractive option for precise calculation.

When both of $np+mq$ and $n(1-p)+m(1-q)$ are not small (exceeding $5$ is often considered ok), the Normal approximation to the Binomial distributions will give a good approximation. Specifically, the approximating Normal distribution will have mean

$$\mu= np + mq,$$

variance

$$\sigma^2 = np(1-p) + mq(1-q),$$

and the chance is therefore approximated (using a continuity correction) by

$$\Pr(X \ge x) \approx \Phi\left(\frac{\mu - x + 1/2}{\sigma}\right)$$

where $\Phi$ is the CDF of the standard Normal distribution. If you're brave, you can also approximate the individual probabilities as

$$\eqalign{ \Pr(X = x) &= \Pr(X \ge x) - \Pr(X \ge x+1) \\ &\approx \Phi\left(\frac{\mu-x+1/2}{\sigma}\right) - \Phi\left(\frac{\mu-x-1/2}{\sigma}\right).}$$

As an example, with $n=6,$ $p=0.40,$ $m=10,$ and $q=0.25$ (the chances in the question, with the minimal numbers of trials for the approximation to hold), a simulation of 100,000 values of $X$ (shown by the line heights) is pretty well reproduced by the approximation (shown by the dots):

This R code produced the figure.

n <- 6
m <- 10
p <- 0.4
q <- 0.25
#
# Simulate X.
#
n.sim <- 1e5
A <- rbinom(n.sim, n, p)
B <- rbinom(n.sim, m, q)
X <- A+B
#
# Plot the simulation.
#
plot(0:(n+m), tabulate(X+1, n+m+1)/n.sim, type="h", ylab="Relative frequency", xlab="x")
#
# Plot the Normal approximation.
#
f <- function(x, n, p, m, q) {
  mu <- n * p + m * q
  sigma <- sqrt(n * p * (1-p) + m * q * (1-q))
  pnorm((x + 1/2 - mu) / sigma) - pnorm((x-1 + 1/2 - mu) / sigma)
}
points(0:(n+m), f(0:(n+m), n, p, m, q), pch=21, bg="#e0000080")

Best Answer

Related Solutions

Solved – Winning probability in a game with multiple players

Solved – How to calculate the probability for multiple trials with different probabilities

Remarks

Related Question