Normal Distribution – Sum of Bernoulli Random Variables with Gaussian Noise

bernoulli-distributionbinomial distributionnoisenormal distribution

This relates to a question asked recently where (one of the edits of) the question asked what happens when a sum of Bernoulli random variables has some form of noise on the probability parameter.

The original question (with some edits for clarity):

John is playing a game on $n$ days, each being independent.

Each day, the probability of a success is $p+\epsilon$ where $\epsilon$ is noise centered around 0, it may be modelled as a $N(0,\sigma^{2})$ or anything convenient.

So I have a series of $\text{Bernoulli}(p+\epsilon)$ random variables, how can I model the sum of these random variables?

It was, quite rightly, pointed out that having Gaussian noise wouldn't work as the support is outside valid probability values.

But what happens if the noise is restricted such that it fits within valid values of probability parameter? How is the Bernoulli then distributed and how is the sum of these random variables distributed?

Best Answer

Here I simply show the results of a sum of Bernoulli random variables where there is noise added to the probability parameter that follows a truncated Gaussian distribution, restricted to valid values of the parameter.

Let's assume we have a random variable $X$ which is, conditionally, a Bernoulli random variable.

$$X | \epsilon\sim\text{Bern}(p+\epsilon)$$ where $$\epsilon\sim TN(0,\sigma,a=0,b=1)$$ and $0<p<1$, $\sigma>0$ and $(a,b)$ represent the lower and upper truncation levels of the truncated Normal distribution, respectively. The truncated Normal distribution is used to ensure that the Bernoulli probability parameter is bounded between valid values.

We can simplify the above characterisation slightly as \begin{align} X | \epsilon\sim&\text{Bern}(\epsilon)\notag \end{align} where \begin{align} \epsilon\sim&TN(p,\sigma,a=0,b=1)\notag \end{align}

We can derive the probability mass function of $X$ using \begin{align} p_{X}(x)&=\int_{\epsilon}p_{X | \epsilon}(x | \epsilon)\, p_{\epsilon}(\epsilon)\,d\epsilon \notag \end{align} where \begin{align} p_{X | \epsilon}(x | \epsilon)&=\epsilon^{x}(1-\epsilon)^{1-x}\notag\\ &=\epsilon x+(1-\epsilon)(1-x)\notag \end{align} and \begin{align} p_{\epsilon}(\epsilon)&=\frac{e^{-\frac{1}{2}\big(\frac{\epsilon-p}{\sigma}\big)^{2}}}{\sqrt{2\pi}\sigma\big(\Phi(1;p,\sigma)-\Phi(0;p,\sigma)\big)}\notag\\ &=Ae^{-\frac{1}{2}\big(\frac{\epsilon-p}{\sigma}\big)^{2}}\notag \end{align}

So \begin{align} p_{X}(x)&=\int_{0}^{1}[\epsilon x+(1-\epsilon)(1-x)]Ae^{-\frac{1}{2}\big(\frac{\epsilon-p}{\sigma}\big)^{2}}\,d\epsilon \notag \end{align}

Let \begin{align} z=\epsilon-p\notag \end{align} Then \begin{align} dz=d\epsilon\notag \end{align}

\begin{align} p_{X}(x)&=\int_{-p}^{1-p}[(z+p)x+(1-z-p)(1-x)]Ae^{-\frac{1}{2}\big(\frac{z}{\sigma}\big)^{2}}\,dz \notag\\ &=\int_{-p}^{1-p}[px+(1-p)(1-x)]Ae^{-\frac{1}{2}\big(\frac{z}{\sigma}\big)^{2}} +(2x-1)zAe^{-\frac{1}{2}\big(\frac{z}{\sigma}\big)^{2}}\,dz \notag\\ &=px+(1-p)(1-x)+(2x-1)A\Big[-\sigma^{2}\Big(e^{-\frac{1}{2}\big(\frac{1-p}{\sigma}\big)^{2}}-e^{-\frac{1}{2}\big(\frac{p}{\sigma}\big)^{2}}\Big)\Big]\notag \end{align}

Which gives $$ p_{X}(x)= \left\{ \begin{aligned} (1-p) + \frac{\sigma\Big(e^{-\frac{1}{2}\big(\frac{1-p}{\sigma}\big)^{2}}-e^{-\frac{1}{2}\big(\frac{p}{\sigma}\big)^{2}}\Big)}{\sqrt{2\pi}\big(\Phi(1;p,\sigma)-\Phi(0;p,\sigma)\big)} &\quad\quad x=0\\ p - \frac{\sigma\Big(e^{-\frac{1}{2}\big(\frac{1-p}{\sigma}\big)^{2}}-e^{-\frac{1}{2}\big(\frac{p}{\sigma}\big)^{2}}\Big)}{\sqrt{2\pi}\big(\Phi(1;p,\sigma)-\Phi(0;p,\sigma)\big)} &\quad\quad x=1 \end{aligned} \right. $$

We can confirm this is a valid probability mass function by \begin{align} \sum_{x}p_{X}(x)&=p_{X}(x=0)+p_{X}(x=1)\notag\\ &=(1-p)+\frac{\sigma\Big(e^{-\frac{1}{2}\big(\frac{1-p}{\sigma}\big)^{2}}-e^{-\frac{1}{2}\big(\frac{p}{\sigma}\big)^{2}}\Big)}{\sqrt{2\pi}\big(\Phi(1;p,\sigma)-\Phi(0;p,\sigma)\big)}\notag\\ &\quad+p-\frac{\sigma\Big(e^{-\frac{1}{2}\big(\frac{1-p}{\sigma}\big)^{2}}-e^{-\frac{1}{2}\big(\frac{p}{\sigma}\big)^{2}}\Big)}{\sqrt{2\pi}\big(\Phi(1;p,\sigma)-\Phi(0;p,\sigma)\big)}\notag\\ &=1\notag \end{align}

Thus, we can say that \begin{align} X\sim\text{Bern}(p^{*})\notag \end{align} where \begin{align} p^{*}&=p-\frac{\sigma\Big(e^{-\frac{1}{2}\big(\frac{1-p}{\sigma}\big)^{2}}-e^{-\frac{1}{2}\big(\frac{p}{\sigma}\big)^{2}}\Big)}{\sqrt{2\pi}\big(\Phi(1;p,\sigma)-\Phi(0;p,\sigma)\big)}\notag \end{align}

We can see that \begin{align} \lim_{\sigma \to 0} \frac{\sigma\Big(e^{-\frac{1}{2}\big(\frac{1-p}{\sigma}\big)^{2}}-e^{-\frac{1}{2}\big(\frac{p}{\sigma}\big)^{2}}\Big)}{\sqrt{2\pi}\big(\Phi(1;p,\sigma)-\Phi(0;p,\sigma)\big)}=0\notag \end{align} which implies that as $\sigma \to 0$, the Bernoulli with Gaussian noise converges to a Bernoulli without noise, as expected. Furthermore, for $p=0.5$, the distribution of a Bernoulli with Gaussian noise is the same as that of a Bernoulli without noise.

Extension to the Binomial distribution is simple. Let's assume we have a sequence of $n$ independent Bernoulli trials with parameter $p^{*}$ and we define the sum as \begin{align} S_{n}&=\sum_{i=1}^{n}X_{i}\notag \end{align}

The moment generating function (MGF) of $S_{n}$ is \begin{align} M_{S_{n}}(t)&=(M_{X}(t))^n=(1-p^{*}+p^{*}e^{t})^{n}\notag \end{align}

which is the MGF of a $\text{Binomial}(n,p^{*})$ random variable.

Example We can illustrate the above with an example: $$p=\{0.3,0.7\},\, \sigma=0.5,\, n=20$$ where $n$ is the number of independent Bernoulli trials.

This gives $$p^{*}=\{0.442248,0.557752\}$$

The probability mass functions of the resultant Binomial random variables with and without noise are shown below.

Bernoulli with and without noise Bernoulli with and without noise2