Solved – Estimating the covariance posterior distribution of a multivariate gaussian

bayesiancovariancedistributionsestimationposterior

I need to "learn" the distribution of a bivariate gaussian with few samples, but a good hypothesis on the prior distribution, so I would like to use the bayesian approach.

I defined my prior:
$$ \mathbf{P}(\mathbf{\mu}) \sim \mathcal{N}(\mathbf{\mu_0},\mathbf{\Sigma_0}) $$
$$ \mathbf{\mu_0} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \ \ \ \mathbf{\Sigma_0} = \begin{bmatrix} 16 & 0 \\ 0 & 27 \end{bmatrix} $$

And my distribution given the hypothesis
$$ \mathbf{P}(x|\mathbf{\mu},\mathbf{\Sigma}) \sim \mathcal{N}(\mathbf{\mu},\mathbf{\Sigma}) $$
$$ \mathbf{\mu} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \ \ \ \mathbf{\Sigma} = \begin{bmatrix} 18 & 0 \\ 0 & 18 \end{bmatrix} $$

Now I know thanks to here that to estimate the mean given the data

$$ \mathbf{P} (\mathbf{\mu} | \mathbf{x_1}, \dots , \mathbf{x_n}) \sim \mathcal{N}(\mathbf{\hat{\mu}_n}, \mathbf{\hat{\Sigma}_n})$$

I can compute:

$$ \mathbf{\hat{\mu}_n} = \mathbf{\Sigma_0} \left( \mathbf{\Sigma_0} + {1 \over n} \mathbf{\Sigma} \right) ^ {-1} \left( {1 \over n} \sum_{i=1}^{n} \mathbf{x_i} \right) + {1 \over n} \mathbf{\Sigma} \left( \mathbf{\Sigma_0} + {1 \over n} \mathbf{\Sigma} \right) ^{-1} \mathbf{\mu_0} $$

$$ \mathbf {\hat{\Sigma}_n} = {1 \over n} \mathbf{\Sigma_0} \left( \mathbf{\Sigma_0} + {1 \over n} \mathbf{\Sigma} \right) ^{-1} \mathbf{\Sigma} $$

Now comes the question, maybe I'm wrong, but it seems to me that $ \mathbf{\Sigma_n} $ is just the covariance matrix for the estimated parameter $\mathbf{\mu_n} $, and not the estimated covariance of my data. What I would like would be to compute also

$$ \mathbf{P} (\mathbf{\Sigma_{n_1}} | \mathbf{x_1}, \dots , \mathbf{x_n}) $$

in order to have a fully specified distribution learned from my data.

Is this possible? Is it already solved by computing $\mathbf{\Sigma_n}$ and it's just expressed in the wrong way the formula above (or I am simply misentrepreting it)? References would be appreciated. Thanks a lot.

EDIT

From the comments, it appeared that my approach was "wrong", in the sense that I was assuming a constant covariance, defined by $ \mathbf{\Sigma} $. What I need would be to put a prior also on it, $ \mathbf{P}(\mathbf{\Sigma}) $, but I don't know what distribution I should use, and subsequently what is the procedure to update it.

Best Answer

You can do Bayesian updating for the covariance structure in much the same spirit as you updated the mean. The conjugate prior for the covariance matrix of the multivariate-normal is the Inverse-Wishart distribution, so it makes sense to start there,

$P(\Sigma) \sim W^{-1}(\mathbf{\Psi}, \nu)$

Then when you get your sample $X$ of length $n$ you can calculate the sample covariance estimate $\Sigma_X = \frac{1}{n}(X-\mu)^\top(X-\mu)$

This can then be used to update your estimate of the covariance matrix

$P(\Sigma|X) \sim W^{-1}(n\Sigma_X + \mathbf{\Psi}, n + \nu)$

You may choose to use the mean of this as your point estimate for the covariance (Posterior Mean Estimator)

$E[\Sigma|X] = \frac{n\Sigma_X + \mathbf{\Psi}}{\nu+n-p-1}$

or you might choose to use the mode (Maximum A Posteriori Estimator)

$\text{Mode}[\Sigma|X] = \frac{n\Sigma_X + \mathbf{\Psi}}{\nu+n+p+1}$

Related Question