Solved – Posterior covariance of Normal-Inverse-Wishart not converging properly

bayesianposteriorpythonwishart-distribution

I am trying to implement a simple normal-inverse-Wishart conjugate prior distribution for a multivariate normal with unknown mean and covariance in numpy/scipy such that it can take a data vector and construct a posterior. I'm using the update equations specified by Wikipedia for a NIW: http://en.wikipedia.org/wiki/Conjugate_prior

My distribution class is as follows:

import numpy as np
from scipy.stats import chi2

class NormalInverseWishartDistribution(object):
    def __init__(self, mu, lmbda, nu, psi):
        self.mu = mu
        self.lmbda = float(lmbda)
        self.nu = nu
        self.psi = psi
        self.inv_psi = np.linalg.inv(psi)

    def sample(self):
        sigma = np.linalg.inv(self.wishartrand())
        return (np.random.multivariate_normal(self.mu, sigma / self.lmbda), sigma)

    def wishartrand(self):
        dim = self.inv_psi.shape[0]
        chol = np.linalg.cholesky(self.inv_psi)
        foo = np.zeros((dim,dim))

        for i in range(dim):
            for j in range(i+1):
                if i == j:
                    foo[i,j] = np.sqrt(chi2.rvs(self.nu-(i+1)+1))
                else:
                    foo[i,j]  = np.random.normal(0,1)
        return np.dot(chol, np.dot(foo, np.dot(foo.T, chol.T)))

    def posterior(self, data):
        n = len(data)
        mean_data = np.mean(data, axis=0)
        sum_squares = np.sum([np.array(np.matrix(x - mean_data).T * np.matrix(x - mean_data)) for x in data], axis=0)
        mu_n = (self.lmbda * self.mu + n * mean_data) / (self.lmbda + n)
        lmbda_n = self.lmbda + n
        nu_n = self.nu + n
        psi_n = self.psi + sum_squares + self.lmbda * n / float(self.lmbda + n) * np.array(np.matrix(mean_data - self.mu).T * np.matrix(mean_data - self.mu))
        return NormalInverseWishartDistribution(mu_n, lmbda_n, nu_n, psi_n)

I am running a simple sanity check to see if the posterior converges to the true distribution:

x = NormalInverseWishartDistribution(np.array([0,0])-3,1,3,np.eye(2))
samples = [x.sample() for _ in range(100)]
data = [np.random.multivariate_normal(mu,cov) for mu,cov in samples]
y = NormalInverseWishartDistribution(np.array([0,0]),1,3,np.eye(2))
z = y.posterior(data)

print 'mu_n: {0}'.format(z.mu)

print 'psi_n: {0}'.format(z.psi)

The mean is appropriately converging, but the scale matrix appears to be converging to incorrectly large values along the diagonal, rather than the true value of 1.

As far as I can tell, I'm copying the update rule exactly. Am I implementing something inappropriately here?

Edit: It looks like in fact the posterior is converging, but the sample routine is returning biased samples. Am I doing something wrong in my sampling method?

Edit2: I've confirmed that the same phenomenon happens in the MCMCpack riwish function in R:

> library(MCMCpack)
> samples <- replicate(100000, riwish(3, matrix(c(1,0,0,1),2,2)))
> mean(samples[1,1,])
[1] 4.889211

This leads me to believe I must be misunderstanding something. From the Wikipedia page (http://en.wikipedia.org/wiki/Inverse-Wishart_distribution), we have:

$\newcommand{\E}{\mathrm{E}}$
$\E[{\Sigma}] = \frac{\Psi}{\nu – p – 1}$

However, in my test case, $\nu = p + 1$, so $\E[{\Sigma}] = \Psi$. Thus, if I sample $\Sigma$ a bunch of times, shouldn't I recover $\Psi$ on average?

Edit 3: Realized my problem was simple algebraic oversight: I needed to set $\nu = p+2$. Problem solved

Best Answer

The problem was that I was setting the degrees of freedom too low-- it should be at least P+2, where $\Psi$ is a PxP matrix.