Distribution of sample variance of independent but not identically distributed normals

chi-squared-distributionmathematical-statisticsnormal distribution

When I am reading the Wikipedia page on the chi-squared distribution, it states that if $X_1, \ldots, X_n$ are $\text{N}(\mu, \sigma^2)$, then $\sum^n_{i=1} (X_i – \bar{X})^2 \sim \sigma^2 \chi^2_{n-1}$ where $\bar{X} = \frac{1}{n} \sum^n_{i=1} X_i$ is the sample mean.

I am wondering if a similar result holds when we do not assume the variances to be the same, that is when $X_i \sim \text{N}(\mu, \sigma_i^2)$ —i.e., whether we should have something like

$$\sum^n_{i=1} (X_i – \bar{X})^2 = \text{(some combination of } \sigma_i^2 \text{)} \times \chi_{n-1}^2.$$

My sense is that it should, but I cannot come up with a proof about it.

Best Answer

You can find a general result about the quadratic form of a normal random vector in this related question. In the case where the the normal random variables are independent wth common mean but different variances, the quadratic form will be a weighted sum of chi-squared-one random variables:


For the random variables specified in your question, you can group them into the normal random vector:

$$\mathbf{X} \sim \text{N}(\mu \mathbf{1}, \boldsymbol{\Sigma}) \quad \quad \quad \quad \quad \boldsymbol{\Sigma} = \text{diag}(\sigma_1^2,...,\sigma_n^2).$$

Using the centering matrix $\mathbf{C}$ we have $\mathbf{X} - \bar{\mathbf{X}} = \mathbf{C} \mathbf{X}$. Moreover, it is simple to show that the matrix $\mathbf{C}$ is symmetric and idempotent, so that $\mathbf{C}^\text{T} \mathbf{C} = \mathbf{C}$. (For detailed information on the centering matrix, see e.g., O'Neill 2020, esp. sections 3-4.) For this analysis I'm also going to use the standard normal random vector $\mathbf{Z} \sim \text{N}(\mathbf{0}, \mathbf{I})$. Using these objects, we can write the quadratic form of interest as:

$$\begin{align} \sum_{i=1}^n (X_i - \bar{X})^2 &= (\mathbf{X} - \bar{\mathbf{X}})^\text{T} (\mathbf{X} - \bar{\mathbf{X}}) \\[6pt] &= (\mathbf{C} \mathbf{X})^\text{T} (\mathbf{C} \mathbf{X}) \\[6pt] &= \mathbf{X}^\text{T} \mathbf{C}^\text{T} \mathbf{C} \mathbf{X} \\[6pt] &= \mathbf{X}^\text{T} \mathbf{C} \mathbf{X} \\[6pt] &= (\mathbf{X} - \mu \mathbf{1})^\text{T} \mathbf{C} (\mathbf{X} - \mu \mathbf{1}) \\[6pt] &= (\boldsymbol{\Sigma}^{1/2} \mathbf{Z})^\text{T} \mathbf{C} (\boldsymbol{\Sigma}^{1/2} \mathbf{Z}) \\[6pt] &= \mathbf{Z}^\text{T} \boldsymbol{\Sigma}^{1/2} \mathbf{C} \boldsymbol{\Sigma}^{1/2} \mathbf{Z} \\[6pt] &\sim \sum_{i=1}^n \lambda_i \cdot \chi_1^2, \end{align}$$

where $\lambda_1,...,\lambda_n$ are the eigenvalues of the matrix $\boldsymbol{\Sigma}^{1/2} \mathbf{C} \boldsymbol{\Sigma}^{1/2}$. This latter matrix is given by:

$$\begin{align} \boldsymbol{\Sigma}^{1/2} \mathbf{C} \boldsymbol{\Sigma}^{1/2} &= \text{diag}(\sigma_1,...\sigma_n) \ \mathbf{C} \ \text{diag}(\sigma_1,...\sigma_n) \\[6pt] &= \begin{bmatrix} \tfrac{n-1}{n} \sigma_1^2 & -\tfrac{1}{n} \sigma_1 \sigma_2 & \cdots & -\tfrac{1}{n} \sigma_1 \sigma_n \\ -\tfrac{1}{n} \sigma_1 \sigma_2 & \tfrac{n-1}{n} \sigma_2^2 & \cdots & -\tfrac{1}{n} \sigma_2 \sigma_n \\ \vdots & \vdots & \ddots & \vdots \\ -\tfrac{1}{n} \sigma_1 \sigma_n & -\tfrac{1}{n} \sigma_2 \sigma_n & \cdots & \tfrac{n-1}{n} \sigma_n^2 \\ \end{bmatrix}. \\[6pt] \end{align}$$

You can take the above matrix form and use it to compute the eigenvalues $\lambda_1,...,\lambda_n$ for your standard deviation values $\sigma_1,...,\sigma_n$. This then gives you the weightings for the weighted sum of chi-squared-one random variables.

Related Question