Solved – the difference between Mean Squared Deviation and Variance

meanmseterminologyvariance

I am doing some tutoring for an AS-Level maths student and unfortunately for me they are doing statistics. This is not my strong point, mainly from the point of view of remembering all of the definitions, formulae and statistics. The workbook they had asked them to work out the Mean, the Variance, Standard Deviation, the Mean Squared Deviation and Root Mean Squared Deviation.

The Variance is defined on wikipedia as

$Var= \frac{\sum_{i=0}^n (x_i – \bar{x})^2}{n-1}$

The Means Squared Deviation is defined on wikipedia as

$MSD = \frac{\sum_{i=0}^n (x_i – \bar{x})^2}{n}$

except for $\bar{x}$ expected value as opposed to $\hat{y_i}$.

The Standard Deviation and Root Mean Squared Deviation would be the square roots of the above respectively.

Elsewhere on the internet the is some ambiguity. Even within the Variance wiki page the two formulae, MSD and Var, are referenced as types of variance.

The subtle difference of $n$ vs $n-1$ was not clearly defined within the student's notebook or textbook nor explained why there is a difference. The student asked me why there was a difference and I gave some "it's a sample vs population thing – go with it".

So, in summary, my final short questions are,

  1. What is the difference between Var and MSD? Are the above definitions correct?
  2. Is the MSD just another name for the population variance?
  3. Does Var imply using the sample variance?
  4. When would you use either MSE or Var over the other?

Best Answer

Squared difference divided by $n$ or by $n-1$ are both variance. The only difference is that in the second case it is an unbiased estimator of variance. Taking square root of it leads to estimating standard deviation.

I guess that mean squared deviation and root mean squared deviation are used more commonly in machine learning field where you have mean squared error and it's square root that are often used.

I also guess that some people prefer using mean squared deviation as a name for variance because it is more descriptive -- you instantly know from the name what someone is talking about, while for understanding what variance is you need to know at least elementary statistics.

Check the following threads to learn more: