Is there a way to update a normal distribution when given new data points without knowing the original data points? What is the minimum information that would need to be known? For example, if I know the mean, standard deviation, and the number of original data points, but not the values of those points themselves, is it possible?
[Math] Iteratively Updating a Normal Distribution
normal distributionstatistics
Best Answer
It is certainly possible. The best way, avoiding some numerical precision issues, is to track the following two values, using the new $n$th observation $a_n$ each time to update the following:
$$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}$$
$$s_n = s_{n-1} + (a_n - m_{n-1})(a_n - m_n)$$
starting with $m_0=s_0 =0$. Then the mean of the first $n$ values is $m_n$ while the standard deviation is $\sqrt{\frac{s_n}{n}}$ or $\sqrt{\frac{s_n}{n-1}}$ depending on what denominator you usually use to calculate the standard deviation. If you would prefer to just track the standard deviation you can calculate $s_{n-1}=(n-1)\sigma_{n-1}^2 \text{ or } (n-2)\sigma_{n-1}^2 $ each time.