Can anyone suggest where to obtain the results of the 10,000 coin flips (i.e., all 10,000 heads and tails) performed by John Kerrich during WWII?
Probability – John Kerrich Coin-flip Data Analysis
probability
Related Solutions
You're right. If $P(H) = 0.2$, and you're using zero-one loss (that is, you need to guess an actual outcome as opposed to a probability or something, and furthermore, getting heads when you guessed tails is equally as bad as getting tails when you guessed heads), you should guess tails every time.
People often mistakenly think that the answer is to guess tails on a randomly selected 80% of trials and heads on the remainder. This strategy is called "probability matching" and has been studied extensively in behavioral decision-making. See, for example,
West, R. F., & Stanovich, K. E. (2003). Is probability matching smart? Associations between probabilistic choices and cognitive ability. Memory & Cognition, 31, 243–251. doi:10.3758/BF03194383
Below is part of an answer for how to use a different link function to capture the nonlinearity.
As discussed in the comments, the relationship between the number of tosses, nb_toss, and the probability of observing a success is nonlinear.
If $p_0$ is the probability of observing a failure in a single toss, then the probability of observing only failures in n tosses is $p_0^n$, and of observing a success overall is $1-p_0^n$
For simplicity I switch the definition of the outcome so that failure is 1 and success is zero. Then the probability of observing a failure is $Pr(failure) = p_0^n$ which we can rewrite as.
$Pr(failure) = exp(n * log(p_0)$
This is just a log-link with n, i.e. nb_toss
in the linear prediction part. The estimated coefficient is the log probability of a toss, so we need to take exp to recover $p_0$.
Below I use statsmodels with the data and imports from the question. The estimated $p_0$ is 0.508, close to 0.5, and the prediction also match closely the true probabilities.
res_glm = sm.GLM(1 - y, X[:, 1],
family=sm.families.Binomial(link=sm.families.links.log())).fit()
print(res_glm.summary())
print(np.exp(res_glm.params))
nbt = X[:, 1]
ii = np.arange(20)
table = np.column_stack((0.5**ii, res_glm.predict(ii),
[1 - y[nbt == i].mean() for i in ii]))
print(pd.DataFrame(table, columns=['true', 'predicted', 'sample']))
this prints
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 10000
Model: GLM Df Residuals: 9999
Model Family: Binomial Df Model: 0
Link Function: log Scale: 1.0
Method: IRLS Log-Likelihood: -765.47
Date: Mon, 15 May 2017 Deviance: 1530.9
Time: 19:57:36 Pearson chi2: 1.88e+04
No. Iterations: 10
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
var_0 -0.6762 0.020 -34.626 0.000 -0.714 -0.638
==============================================================================
[ 0.50856518]
true predicted sample
0 1.000000 1.000000 NaN
1 0.500000 0.508565 0.521368
2 0.250000 0.258639 0.290141
3 0.125000 0.131535 0.120944
4 0.062500 0.066894 0.055385
5 0.031250 0.034020 0.027778
6 0.015625 0.017301 0.008380
7 0.007812 0.008799 0.014663
8 0.003906 0.004475 0.002770
9 0.001953 0.002276 0.000000
10 0.000977 0.001157 0.000000
11 0.000488 0.000589 0.000000
12 0.000244 0.000299 0.002924
13 0.000122 0.000152 0.000000
14 0.000061 0.000077 0.002915
15 0.000031 0.000039 0.000000
16 0.000015 0.000020 0.000000
17 0.000008 0.000010 0.000000
18 0.000004 0.000005 0.000000
19 0.000002 0.000003 0.000000
aside: statsmodels prints a DomainWarning
DomainWarning: The log link function does not respect the domain of the Binomial family.
In general, there can be problems when using log-link with Binomial, i.e. log-Binomial, because the log-link does not force the predicted values to be in the range [0, 1]. However, the way this is set up in this example, the prediction is limited by 1 because the explanatory variable is nonnegative.
Best Answer
I hadn't heard about Kerrich before-- what a bizarre story. The Google book scan (shared by reftt) of "An Experimental Introduction to the Theory of Probability" doesn't seem to include the body of the text. Feeling a little old-fashioned, I checked out a copy of the 1950 edition from the library.
I have scanned a few pages that I found interesting. The pages describe his test conditions, data from the first 2000 coin flips and data from the first 500 of a series of 5000 equally implausible-sounding urn experiments (with 2 red and 2 green ping pong balls).
Text recognition (and some cleanup) using Mathematica 9 gives this sequence of 2000 tails (0) and heads (1) from Table 1. The head count of 1014 is one more than 502+511=1013 in Table 2, so the recognition was imperfect, but it looks pretty good--at least it got the right number of characters! (Sharp-eyed readers are invited to correct it.)
Here is a graphical summary of this random walk, followed by the data themselves. The accumulated difference between head and tail counts proceeds from left to right, covering all 2000 results.