Solved – Given a 10D MCMC chain, how can I determine its posterior mode(s) in R

bayesiank nearest neighbourmarkov-chain-montecarlomoder

Question: With a 10 dimensional MCMC chain, let's say I'm prepared to hand you a matrix of the draws: 100,000 iterations (rows) by 10 parameters (columns), how best can I identify the posterior modes? I'm especially concerned with multiple modes.

Background: I consider myself a computationally savvy statistician, but when a colleague asked me this question, I was ashamed that I couldn't come up with a reasonable answer. The primary concern is that multiple modes may appear, but only if at least eight or so of the ten dimensions are considered. My first thought would be to use a kernel density estimate, but a search through R revealed nothing promising for problems of greater than three dimensions. The colleague has proposed an ad-hoc binning strategy in ten dimensions and searching for a maximum, but my concern is that bandwidth might either lead to significant sparsity problems or to a lack of resolution to discern multiple modes. That said, I'd happily accept suggestions for automated bandwidth suggestions, links to a 10 kernel density estimator, or anything else which you know about.

Concerns:

  1. We believe that the distribution may be quite skewed; hence, we wish to identify the posterior mode(s) and not the posterior means.

  2. We are concerned that there may be several posterior modes.

  3. If possible, we'd prefer an R based suggestion. But any algorithm will do as long as it isn't incredibly difficult to implement. I guess I'd prefer not to implement a N-d kernel density estimator with automated bandwidth selection from scratch.

Best Answer

Have you considered using a nearest neighbour approach ?

e.g. building a list of the k nearest neighbours for each of the 100'000 points and then consider the data point with the smallest distance of the kth neighbour a mode. In other words: find the point with the 'smallest bubble' containing k other points around this point.

I'm not sure how robust this is and the choice for k is obviously influencing the results.

Related Question