[Math] Iteratively Updating a Normal Distribution

normal distributionstatistics

Is there a way to update a normal distribution when given new data points without knowing the original data points? What is the minimum information that would need to be known? For example, if I know the mean, standard deviation, and the number of original data points, but not the values of those points themselves, is it possible?

Best Answer

It is certainly possible. The best way, avoiding some numerical precision issues, is to track the following two values, using the new $n$th observation $a_n$ each time to update the following:

$$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}$$

$$s_n = s_{n-1} + (a_n - m_{n-1})(a_n - m_n)$$

starting with $m_0=s_0 =0$. Then the mean of the first $n$ values is $m_n$ while the standard deviation is $\sqrt{\frac{s_n}{n}}$ or $\sqrt{\frac{s_n}{n-1}}$ depending on what denominator you usually use to calculate the standard deviation. If you would prefer to just track the standard deviation you can calculate $s_{n-1}=(n-1)\sigma_{n-1}^2 \text{ or } (n-2)\sigma_{n-1}^2 $ each time.

Related Question