[Math] Just learned about the bell curve in statistics. How is calculus related to this curve

calculusnormal distributionordinary differential equationsstatistics

I'm learning about the bell curve in statistics and I'm trying to understand the calculus behind the concept. I've taken calc 1 already. How is the integral related to this asymptotic lump? What's on the Y axis? What's the x axis? The area under the bell curve, what is that? what's the equation of the line? How does standard deviation relate to calculus? Could someone bridge my gap of understanding of the bell curve and calculus?

Best Answer

If displacement and velocity are familiar from your calculus course, recall that if you put time on the horizontal axis and the velocity of a moving object on the vertical axis, then the (signed) area under the velocity graph between times $t_1$ and $t_2$ is the net displacement of the object during that time interval.

In probability and statistics, we again encounter areas under curves. Consider a random variable, such as the height of a woman randomly chosen from some given population. Here the horizontal axis will be height, and the vertical axis will be a quantity called probability density. The area under the probability density graph between heights $h_1$ and $h_2$ represents the probability that the height of the randomly selected woman is in that interval.

The normal curve is just one example of a probability density function; there are many others. Calculus—integration in particular—is the tool used to compute probabilities.

In practice, we have a problem when it comes to the normal distribution. The problem is that there is no "closed form" for $\int_{h_1}^{h_2}\frac{1}{\sqrt{2\pi}}e^{-h^2/2}\,dh,$ which is the integral that needs to be done. That is, there is no formula for the result of this integration in terms of familiar functions. This isn't a big problem since the integral can be computed numerically. Many calculators and all statistics packages can evaluate this integral, and its value is tabulated in statistics books.

The condition that total probability equals $1$ corresponds to the condition that the total area under the normal curve equals $1.$ That is $$ \int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\,dx=1. $$ This special case of the area calculation actually can be done exactly, but it uses knowledge beyond what one usual learns in Calculus I.

You ask how standard deviation relates to calculus. In fact, both expected value and variance (standard deviation is the square root of variance) are defined as integrals. If $f(x)$ is the probability density function of a random variable $X,$ then $$ \begin{aligned} E[X]&=\int_{-\infty}^\infty xf(x)\,dx\\ \text{Var}[X]&=\int_{-\infty}^\infty (x-E[X])^2f(x)\,dx \end{aligned} $$ In the case of the normal distribution, the expected value is $$ E[X]=\int_{-\infty}^\infty x\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\,dx=0, $$ which follows because the integrand is an odd function (areas left and right of $x=0$ cancel). The variance turns out to be $1.$ The integral can be evaluated using integration by parts.

There is a lot more to be said about how calculus relates to statistics. This summary barely scratches the surface.

Related Question