Solved – Degrees of Freedom In Sample Variance

bessels-correctiondegrees of freedomunbiased-estimatorvariance

Recall the formula for sample variance $$s_{n – 1}^2 = \dfrac{1}{n -1} \sum_{i = 1}^n (\bar{x} – x_i)^2,$$ where $\bar{x}$ is the sample mean. There are many proofs for why $s_{n – 1}^2$ is an unbiased estimator for the population variance $\sigma^2$, although I find most clever but not particularly illuminating.

Often times, as a way to provide intuition a mention will be made as to the fact that the elements of the sample will all be closer to the sample mean (as they went into its calculation). Sometimes, this intuition will be linked to the idea of degrees of freedom, which states that if we fix $\bar{x}$, we need only determine $(n – 1)$ of the elements in the sample in order to know the $n$th element of the sample. This argument based on degrees of freedom is then used to justify/provide intuition for why the unbiased estimator should involve division by $(n – 1)$ and not $n$.

However, I cannot see why degrees of freedom actually matters in terms of finding an unbiased estimator.

So, finally, my question is: is there a rigorous way to tie in the notion of degrees of freedom into a proof of the fact that $s_{n – 1}^2$ is an unbiased estimator of $\sigma^2$.


Note: a very similar question was posed here. However, the answer again just gave a proof that the formula as stated is an unbiased estimator, and Bessel's correction "just fell out of" the manipulations. My question is – why a priori, without doing any calculations, could we have KNOWN that we should be dividing by degrees of freedom instead of by number of samples in order to get an unbiased estimate. The reason I ask is because so often I have heard "degrees of freedom" given as an "explanation" for the Bessel correction, but I just don't see how it actually explains anything – at least based on the proofs I have seen it seems more like a "lucky coincidence" than an "explanation," although I am sure this is based on my failure to learn the subject deeply enough, which is why I am posting this question.

Best Answer

The connection is related to the eigenvalues of the centering matrix

The "why" of the connection at issue here actually goes down quite deeply into mathematical territory. In my view, it is related to the eigenvalues of the centering matrix, which have connections to the rank of that matrix. Before I get into the demonstration of this issue, I'll note that you can find a broad discussion of the centering matrix and its connection to the concept of degrees-of-freedom in Section 4 of O'Neill (2020). The material I give here is largely an exposition of what is shown in that section of that paper.


Preliminaries: Showing the connection between Bessel's correction and the degrees-of-freedom requires a bit of setup, and it also requires us to state the formal definition of degrees-of-freedom. To do this, we note that the sample variance is formed from the deviations of the values from their sample mean, which is a linear transformation of the sample vector. We can write this (using upper-case for random variables) as:

$$S^2 = \frac{1}{n-1} ||\mathbf{R}||^2 \quad \quad \quad \quad \quad \mathbf{R} = \mathbf{X} - \bar{\mathbf{X}} = \mathbf{C} \mathbf{X},$$

where $\mathbf{C}$ is the centering matrix. The centering matrix $\mathbf{C}$ is a projection matrix, with $n-1$ eigenvalues equal to one, and one eigenvalue equal to zero. Its rank is the sum of its eigenvalues, which is $\text{rank} \ \mathbf{C} = n-1$.


The degrees-of-freedom: Formally, the degrees-of-freedom for the deviation vector is the dimension of the space of allowable values $\mathscr{R} \equiv \{ \mathbf{r} = \mathbf{C} \mathbf{x} | \mathbf{x} \in \mathbb{R}^n \}$, which is:

$$\begin{equation} \begin{aligned} DF = \dim \mathscr{R} &= \dim \{ \mathbf{r} = \mathbf{C} \mathbf{x} | \mathbf{x} \in \mathbb{R}^n \} \\[6pt] &= \text{rank} \ \mathbf{C} \\[6pt] &= n-1. \\[6pt] \end{aligned} \end{equation}$$

This establishes the degrees-of-freedom formally by connection to the eigenvalues of the centering matrix. We now connect this directly to the expected value of the squared-norm of the deviations that appears in the sample variance statistic.


Establishing the connection: The squared-norm of the deviations is a quadratic form using the centering matrix, and it can be simplified using the spectral form of the centering matrix. The centering matrix can be written in its spectral form as $\mathbf{C} = \mathbf{u}^* \mathbf{\Delta} \mathbf{u}$ where $\mathbf{u}$ is the (orthonormal) normalised DFT matrix and $\mathbf{\Delta} = \text{diag}(\lambda_0,\lambda_1,...,\lambda_{n-1})$ is the diagonal matrix of the eigenvalues of the centering matrix (which we leave unstated for now). Using this form we can write the squared-norm of the deviations as:

$$\begin{equation} \begin{aligned} ||\mathbf{R}||^2 &= \mathbf{R}^\text{T} \mathbf{R} \\[6pt] &= (\mathbf{C} \mathbf{x})^\text{T} (\mathbf{C} \mathbf{x}) \\[6pt] &= \mathbf{x}^\text{T} \mathbf{C} \mathbf{x} \\[6pt] &= \mathbf{x}^\text{T} \mathbf{u}^* \mathbf{\Delta} \mathbf{u} \mathbf{x} \\[6pt] &= (\mathbf{u} \mathbf{x})^* \mathbf{\Delta} (\mathbf{u} \mathbf{x}). \\[6pt] \end{aligned} \end{equation}$$

Now, the matrix $\mathbf{u} \mathbf{x} = (\mathscr{F}_\mathbf{x}(0), \mathscr{F}_\mathbf{x}(1/n), ..., \mathscr{F}_\mathbf{x}(1-1/n))$ is the DFT of the sample data, so we can expand the above quadratic form to obtain:

$$||\mathbf{R}||^2 = (\mathbf{u} \mathbf{x})^* \mathbf{\Delta} (\mathbf{u} \mathbf{x}) = \sum_{i=0}^{n-1} \lambda_i \cdot ||\mathscr{F}_\mathbf{x}(i/n)||^2.$$

(Note: Once we substitute the eigenvalues, we will see that this is a just a manifestation of the discrete version of the Plancherel theorem.) Since $X_1,...,X_n$ are IID with variance $\sigma^2$, it follows that $\mathbb{E}(||\mathscr{F}_\mathbf{x}(i/n)||^2) = \sigma^2$ for all $i=0,1,...,n-1$. Substitution of this result gives the expected value:

$$\begin{equation} \begin{aligned} \mathbb{E}(||\mathbf{R}||^2) &= \mathbb{E} \Big( \sum_{i=0}^{n-1} \lambda_i \cdot ||\mathscr{F}_\mathbf{x}(i/n)||^2 \Big) \\[6pt] &= \sum_{i=0}^{n-1} \lambda_i \cdot \mathbb{E}(||\mathscr{F}_\mathbf{x}(i/n)||^2) \\[6pt] &= \sum_{i=0}^{n-1} \lambda_i \cdot \sigma^2 \\[6pt] &= \sigma^2 \sum_{i=0}^{n-1} \lambda_i \\[6pt] &= \sigma^2 \cdot \text{tr} \ \mathbf{C} \\[6pt] &= \sigma^2 \cdot \text{rank} \ \mathbf{C} = \sigma^2 \cdot DF. \\[6pt] \end{aligned} \end{equation}$$

(Since the centering matrix is a projection matrix, its rank is equal to its trace.) Hence, to obtain and unbiased estimator for $\sigma^2$ we use the estimator:

$$\hat{\sigma}^2 \equiv \frac{||\mathbf{R}||^2}{DF} = \frac{1}{n-1} \sum_{i=1}^n (x_i-\bar{x})^2.$$

This establishes a direct connection between the denominator of the sample variance and the degrees-of-freedom in the problem. As you can see, this connection arises through the eigenvalues of the centering matrix --- these eigenvalues determine the rank of the matrix, and thereby determine the degrees-of-freedom, and they affect the expected value of the squared-norm of the deviation vector. Going through the derivation of these results also gives a bit more detail about the behaviour of the deviation vector.

Related Question