Solved – Designing a test for a psychic who says he can influence dice rolls

diceexperiment-designprobability

Let's say that I have a friend (let's call him "George") who says that he can control the roll of dice using his mind (ie, make the dice more likely to fall on a specific number he is thinking of).

How can I design a scientifically rigorous test to determine if he can actually do this? (I don't really think he can, of course, but I want him to agree to the details of a test, Amazing Randi-style, before the test begins.) I want to reduce the (very likely) post test-excuses that he'll come up with.

Here is what I have so far:

Determine the physical dice-rolling technique (which dice, shaker cup, landing surface, etc)
Define a "test session", consisting of X rolls of the dice. This needs to be small enough to do in one sitting, but large enough to determine (after analysis) within 95%-99% confidence whether the dice fell fair, or favored one side
Run Y sessions on the chosen dice (with no influence from George), as a "control" to make sure that the dice show "fair" results on their own
Run Z sessions with George. Before each, roll a separate die to determine what number George will "concentrate on" during that entire session.
Compile and analyze the results.
George makes up some excuses for his dismal performance.

So my questions for you:

Any flaws or problems with my overall methodology? Anything George would likely object to?
Should I use a D6? Or a D20? Does it matter? Would a die with more faces require more rolls to produce similarly confident results? Or the opposite? I'd prefer fewer rolls than more, due to practical considerations 🙂
What are reasonable values for X, Y, and Z? They are not entirely unrelated; if my chosen value of X only allows 95% confidence for a single session, then 1 out of every 20 sessions could "fail", even without George's influence
How do I define "success" or "failure" for an individual session? (I did find this question which goes through the details of a chi-squared test, so I think that's my evaluation methodology, but what are reasonable confidence thresholds?)
How do I define "success" or "failure" for the overall test? George might "win" a single session by sheer chance, but how many of the Z sessions would he have to pass to pass the whole test?

I'll probably be analyzing these results in a MS Excel spreadsheet, if that makes any difference.

Best Answer

I would recommend analyzing this in the following way:

Count each role in which George successfully predicts the outcome as a success, and every other one as a failure. Then, you easily calculate a probability of success for George, and a 95% or 99% confidence interval. Does he claim he can predict the outcome "twice as well" as randomly rolling the dice? Then:

H0: p >= 1/3

H1: p < 1/3

(assuming a 6-sided die).

From there, it's pretty straightforward to do the hypothesis test. Also, you can calculate the power a priori pretty easily (even in something like Excel). Pick a number of rolls (like 10), and then make a table with the possible successes as rows (0-10). Then, for each success, calculate the probability he'll have that many successes (if he were to be just guessing, which is what we're assuming he is doing). Also, for each value, determine if it would lead to a rejection or acceptance of the null. Then, to find the power, you can simply add up all the probabilities where the null would be rejected.

Related Solutions

Solved – How to determine xdy dice roll from a sample

Sure there is.

Let's assume that we could see $k$ different types of dice with $Y=\{y_1, \dots, y_k\}$ pips. The "classical" D&D dice would yield $Y=\{2, 4, 6, 8, 10, 12, 20, 100\}$, but anything else could be made to work as below. Let $|Y|$ denote the number of different possible dice.

Assume that the minimum observed roll is $m$ and the maximum is $M$. This gives us bounds on the possible number of dice involved: we must have

$$m\leq x\leq M.$$

Now, simulate a large number of rolls from each remaining candidate xdy (by the formula above, there are $(M-m+1)\times |Y|$ different candidates). This large number can be larger than the number of original observations. In each case, create a histogram of total outcomes.

Finally, compare these density histograms to the histogram of the original observations. (We work with the densities, not the frequencies, so we can use different numbers of simulations from the original number of observations.) The earth mover's distance gives you a notion of distance between histograms. The simulation-based histogram with the lowest earth mover's distance to your original observation histogram is most consistent with your observations.

Here is some R code. We simulate 1000 rolls of 3d6 and use the standard D&D dice as candidates $Y$.

set.seed(1)
obs <- rowSums(matrix(ceiling(runif(3000,0,6)),nrow=1000))
candidates.y <- c(4,6,8,10,12,20,100)   # standard D&D dice

breaks <- c(seq(.5,max(obs)+.5,by=1),max(obs)*max(candidates.y))
hist.obs <- hist(obs,breaks=breaks,freq=FALSE)

candidates.x <- 1:max(obs)

n.sims <- 1000

library(emdist)
emd.dist <- matrix(NA,nrow=length(candidates.x),ncol=length(candidates.y),
    dimnames=list(candidates.x,candidates.y))
for ( ii in seq_along(candidates.x) ) {
    for ( jj in seq_along(candidates.y) ) {
        set.seed(1) # for replicability
        obs.sim <- rowSums(matrix(ceiling(
            runif(candidates.x[ii]*n.sims,0,candidates.y[jj])),
          nrow=n.sims))
        hist.sim <- hist(obs.sim,breaks=breaks,plot=FALSE)
        emd.dist[ii,jj] <- emd2d(matrix(hist.obs$density,nrow=1),
          matrix(hist.sim$density,nrow=1))
    }
}
emd.dist

Output:

           4         6         8        10         12        20        100
1  7.9029999 6.8909998 5.8920002 4.8950000 3.88700008 1.9228243 0.39887184
2  5.4299998 3.4460001 1.4499999 0.8147613 1.08018708 0.8099990 0.04417419
3  2.9579999 0.0000000 1.8063331 1.9216927 1.61559486 0.2684011 0.35928789
4  0.6180000 2.4049284 2.2750988 1.3528987 0.71173453 0.2177625 1.00000000
5  1.9768735 2.8317218 1.7545075 0.5033562 0.17459151 0.1226299 1.00000000
6  3.4169219 1.9913402 0.4674550 0.1918677 0.05264558 1.0000000 1.00000000
7  3.3924465 0.9393466 0.2968451 0.1572183 0.21881166 1.0000000 1.00000000
8  2.6134274 0.3687483 0.1226299 1.0000000 1.00000000 1.0000000 1.00000000
9  1.5483801 0.3894975 0.3592879 1.0000000 1.00000000 1.0000000 1.00000000
10 0.3270817 0.2188117 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000
11 0.1052912 0.3592879 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000
12 0.3592879 1.0000000 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000
13 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000
14 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000
15 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000
16 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000
17 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000
18 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000 1.0000000 1.00000000

And look, the 3d6 in fact does have the simulated histogram with the lowest distance from your observations' histogram. Hurray!

A couple of comments:

This works better if you have "many" observations. With few observations, you get a lot of noise, and many different simulated histograms are consistent with your noisy data.
Of course, the larger your candidate set $Y$, the lower your chance will be of getting the true values.
If the true $y$ is not in $Y$, you will at least get something "close to it".
The formally-statistically-correct treatment would be a Bayesian one, putting equal prior probabilities on all possible $x$ and $y\in Y$, then calculating or simulating draws and deriving posterior probabilities on pairs $(x,y)$. Yes, this can be done rigorously. Anyone want to have a go?

Solved – How many times must I roll a die to confidently assess its fairness

TL;DR: if $p$ = 1/6 and you want to know how large $n$ needs to be 98% sure the dice is fair (to within 2%), $n$ needs to be at least $n$ ≥ 766.

Let $n$ be the number of rolls and $X$ the number of rolls that land on some specified side. Then $X$ follows a Binomial(n,p) distribution where $p$ is the probability of getting that specified side.

By the central limit theorem, we know that

$$\sqrt{n} (X/n - p) \to N(0,p(1-p))$$

Since $X/n$ is the sample mean of $n$ Bernoulli$(p)$ random variables. Hence for large $n$, confidence intervals for $p$ can be constructed as

$$\frac{X}{n} \pm Z \sqrt{\frac{p(1-p)}{n}}$$

Since $p$ is unknown, we can replace it with the sample average $\hat{p} = X/n$, and by various convergence theorems, we know the resulting confidence interval will be asymptotically valid. So we get confidence intervals of the form

$$\hat{p} \pm Z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

with $\hat{p} = X/n$. I'm going to assume you know what $Z$-scores are. For example, if you want a 95% confidence interval, you take $Z=1.96$. So for a given confidence level $\alpha$ we have

$$\hat{p} \pm Z_\alpha \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Now let's say you want this confidence interval to be of length less than $C_\alpha$, and want to know how big a sample we need to make this case. Well this is equivelant to asking what $n_\alpha$ satisfies

$$Z_\alpha \sqrt{\frac{\hat{p}(1-\hat{p})}{n_\alpha}} \leq \frac{C_\alpha}{2}$$

Which is then solved to obtain

$$n_\alpha \geq \left(\frac{2 Z_\alpha}{C_\alpha}\right)^2 \hat{p}(1-\hat{p})$$

So plug in your values for $Z_\alpha$, $C_\alpha$, and estimated $\hat{p}$ to obtain an estimate for $n_\alpha$. Note that since $p$ is unknown this is only an estimate, but asymptotically (as $n$ gets larger) it should be accurate.

Best Answer

Related Solutions

Solved – How to determine xdy dice roll from a sample

Solved – How many times must I roll a die to confidently assess its fairness

Related Question