Here is the sentence which I'm confused with
Imagine I give you a coin, and want you to estimate the probability it
will spin to heads. Given what you know, the most reasonable prior
belief is to expect any probability of the coin spinning to heads.
This can be captured in a uniform prior on pp, the probability that a
coin when spun will land on heads: var p = uniform(0,1).You conduct an experiment. You spin the coin 20 times. 15 of them,
they spin to heads.
I'm confused with the part uniform(0,1).
why flipping coin prior can be represent with Uniform(0,1)?
I found some related answer here
============================================================
If random variables (r.v.s) $a_1,a_2,\ldots$ are
i.i.d.
uniformly distributed on the set $\{0,1\}$, then
$\alpha=(0.a_1a_2\ldots)_2$ is a r.v. uniformly distributed on the
real interval $[0,1]$. To see this, note that for any
$x=(0.x_1x_2\ldots)_2\in[0,1)$ (always taking, WLOG, the unique binary
representation of $x$ that has infinitely many $0$s), we have the
following: $$\begin{align}\{\alpha > x\} = & \{a_1>x_1\}\cup\\
> &\{\{a_1=x_1\}\cap \{a_2>x_2\}\}\cup\\ &\{\{a_1=x_1\}\cap
> \{a_2=x_2\}\cap\{a_3>x_3\} \}\cup\\ &\ldots \end{align}$$Now, $P(a_i >x_i) = \frac{1}{2}(1-x_i)$, so the probability of the
above disjoint union is just $$\begin{align}P(\alpha>x) &=
> \frac{1}{2}(1-x_1) + \frac{1}{2^2}(1-x_2) +
> \frac{1}{2^3}(1-x_3)+\ldots\\ &= \sum_{i=1}^\infty \frac{1}{2^i} –
> \sum_{i=1}^\infty \frac{x_i}{2^i}\\ &= 1 – x\\ \therefore P(\alpha\le
> x) &= x \end{align} $$ therefore $\alpha$ is a r.v. uniformly
distributed on $[0,1]$.
==========================================================
But I'm quite confused with what is
$$P(\alpha > x )$$
and why does he concluded
$$P(\alpha \le x)=x$$
is uniform distribution on [0,1]
Best Answer
I believe this is intended to be an elementary Bayesian inference problem. It seems you have decided to let $\theta = P(\text{Heads})$ have the "flat" or "noninformative" prior distribution $\mathsf{Unif}(0,1) \equiv \mathsf{Beta}(\alpha=1,\beta=1).$ So your prior distribution is $p(\theta) = 1.$
Then from your experiment, you have $n=20$ Bernoulli trials resulting in $x = 15$ Heads, so your likelihood function is $p(x|\theta) = \theta^x(1-\theta)^{n-x} = \theta^{15}(1-\theta)^5.$
Then according to Bayes' Theorem, the posterior distribution (using the uniform prior) is $$p(\theta|x) \propto p(\theta) \times p(x|\theta) \propto \theta^{15}(1-\theta)^5,$$
which we recognize as the kernel (PDF without constant) of $\mathsf{Beta}(16,6).$
The posterior distribution can be used to get a point estimate, which might be the posterior mean $\frac{16}{16+6} = 0.7272$, posterior median 0.7343 (from R), or posterior mode $\frac{15}{20} = 0.75.$
You could also use the posterior distribution to find a Bayesian 95% probability interval $(0.53, 0.89)$ (from R).
Note: Maybe you can get the posterior distribution using the prior $\mathsf{Beta}(5, 5)$ and the same likelihood function, and see what difference that alternate choice of prior distribution makes in the point and interval estimates.