Convergence Diagnostic – Gelman and Rubin Convergence Diagnostic, How to Generalise to Work with Vectors

convergencecovariance-matrixmarkov-chain-montecarlo

The Gelman and Rubin diagnostic is used to check the convergence of multiple mcmc chains run in parallel. It compares the within-chain variance to the between-chain variance, the exposition is below:

Steps (for each parameter):

  1. Run m ≥ 2 chains of length 2n from overdispersed starting values.
  2. Discard the first n draws in each chain.
  3. Calculate the within-chain and between-chain variance.
  4. Calculate the estimated variance of the parameter as a weighted sum of the within-chain and between-chain variance.
  5. Calculate the potential scale reduction factor.
  6. List item

I want to use this statistic but the variables I want to use it with are random vectors.

Does it make sense to take the mean of the covariance matrices in this case?

Best Answer

A recommendation: just compute the PSRF separately for each scalar component

The original article by Gelman & Rubin [1], as well as the Bayesian Data Analysis textbook of Gelman et al. [2], recommends calculating the potential scale reduction factor (PSRF) separately for each scalar parameter of interest. To deduce convergence, it is then required that all PSRFs are close to 1. It does not matter that your parameters are interpreted as random vectors, their components are scalars for which you can compute PSRFs.

Brooks & Gelman [3] have proposed a multivariate extension of the PSRF, which I review in the next section of this answer. However, to quote Gelman & Shirley [4]:

[...] these methods may sometimes represent overkill: individual parameters can be well estimated even while approximate convergence of simulations of a multivariate distribution can take a very long time.

Alternative: multivariate extension by Brooks&Gelman

Brooks & Gelman [3] propose a multivariate extension of the PSRF, where indeed one computes the estimated covariance matrix (your step 4) as a weighted sum of the within-chain ($W$) and between-chain ($B$) covariance matrices (your step 3): \begin{equation} \hat{V} = \frac{n-1}{n}W + \left ( 1 + \frac{1}{m} \right )\frac{B}{n}, \end{equation} where $n$ is the chain length. Then, one needs to define some scalar metric for the distance between the covariance matrices $\hat{V},W$. The authors propose \begin{equation} \hat{R} = \max_a \frac{a^T\hat{V}a}{a^TWa} = \frac{n-1}{n} + \left(\frac{m+1}{m}\right)\lambda_1, \end{equation} where $m$ is the number of chains, the equality is shown in the article with $\lambda_1$ being the largest positive eigenvalue of $W^{-1}\hat{V}/n$. Then, the authors argue that under convergence of the chains, $\lambda_1\rightarrow 0$ and thus with big $n$ this multivariate $\hat{R}$ should converge near 1.

References

[1] Gelman, Andrew, and Donald B. Rubin. "Inference from iterative simulation using multiple sequences." Statistical Science (1992): 457-472.

[2] Gelman, Andrew, et al. Bayesian data analysis. CRC press, 2013.

[3] Brooks, Stephen P., and Andrew Gelman. "General methods for monitoring convergence of iterative simulations." Journal of Computational and Graphical Statistics 7.4 (1998): 434-455.

[4] Gelman, Andrew, and Kenneth Shirley. "Inference from simulations and monitoring convergence". (Chapter 6 in Brooks, Steve, et al., eds. Handbook of Markov Chain Monte Carlo. CRC Press, 2011.)

All articles except the textbook [2] are available at Andrew Gelman's website Andrew Gelman's website.