[Math] Statistics: Where did this function for normal distribution come from

statistics

I am studying normal distribution for the first time and I'm having trouble understanding where this formula came from:

$$\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}$$

Could someone derive this equation?

How about in the general case?

$$\frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }$$

Best Answer

Here's a commonly seen derivation.

Consider throwing a dart at a dartboard, aiming at the origin. We make the assumptions that:

  1. The errors in the horizontal and vertical directions are independent and identically distributed
  2. Errors are isotropic, i.e. the magnitude of the error doesn't depend on direction
  3. The chance of the dart landing in a small area is proportional to the area
  4. Large errors are less likely than small errors

Say that the probability of landing in a strip centered on $x$ and of width $\Delta x$ is $p(x)\Delta x$, and similarly $p(y)\Delta y$ for landing in a strip centered at $y$ of width $\Delta y$.

Since horizontal and vertical errors are independent we can multiply these probabilities to get the probability of landing in a box at $(x,y)$ of size $\Delta x\Delta y$. By assumptions (1) and (2) this should only depend on the distance from the origin, and should also be proportional to $\Delta x\Delta y$:

$$p(x)\Delta x \cdot p(y)\Delta y = f(r) \Delta x \Delta y$$

which tells us that

$$p(x)p(y) = f(r)$$

If we differentiate with respect to the angular coordinate $\theta$ on both sides, we get

$$p(x) \frac{\partial p(y)}{\partial \theta} + p(y) \frac{\partial p(x)}{\partial \theta} = 0$$

which, using polar coordinates $x=r\cos\theta$ and $y=r\sin\theta$ becomes

$$p(x)p'(y) r\cos\theta - p(y)p'(x) r\sin\theta = 0$$

or

$$p(x)p'(y)x - p(y)p'(x) y = 0$$

which can be rearranged to

$$\frac{p(x)x}{p'(x)} = \frac{p(y)y}{p'(y)}$$

Since both sides are functions of an independent variable and yet are equal, they must be equal to a constant:

$$xp(x) = Cp'(x)$$

and we can now separate variables and integrate, getting

$$\frac{x^2}{2} = C\log p(x) + b$$

or

$$p(x) = A \exp\left(\frac{x^2}{2C} \right)$$

Now the assumption that large errors are less likely than small tells us that $C<0$, and the constant $A$ is determined by the requirement that the probability integrates to 1.

Related Question