Bayesian Models – Exploring Latent Variables, Overparameterization, and MCMC Convergence

bayesianconvergencelatent-variablemarkov-chain-montecarlomultilevel-analysis

Sometimes I have a large number of latent variables in a Bayesian hierarchical model to which, but I am only interested in estimating projected transformations of those latent variables (for example, I will parameterize a binomial parameter as an inverse logit of a set of possibly-non-identifiable covariates, even though the result I'm interested in is the binomial parameter estimate).

The projected transformations will often converge very quickly (based on convergence diagnostics such as the Gelman/Rubin or by eyeballing the posterior samples) even if the latent variables have not yet converged.

Intuitively this makes sense, the model may be an overparameterization where the latent parameters are not identifiable – the derived quantities are constrained to be in a constrained a narrow high-likelihood region of the transformed variables' parameter space which maps to a much larger largely flat likelihood (but bounded) region of the latent variable parameter space.

So is the intuition correct that I shouldn't be concerned that the overparameterized latent variables are not identifiable and aren't fully converged when I take my posterior samples? Are there some good references which discuss the use of non-identified latent variables in this way? I've heard some discussion on overparameterizing to speed up mcmc convergence, but I'm not entirely clear on how to think about this, as the approaches and attitudes towards overparameterization and non-identifiability in bayesian methods seems to be a bit different than in other areas of modeling.

Best Answer

So is the intuition correct that I shouldn't be concerned that the overparameterized latent variables are not identifiable and aren't fully converged when I take my posterior samples?

I think your intuition is correct: you shouldn't be concerned that the overparameterized latent variables are not identifiable and aren't fully converged. In fact, the latent variables likely can't converge; my understanding is that in this situation the full state space chain is null recurrent, even though by your account there is a transformed state space of smaller dimension in which the chain is full recurrent (and hence has a stationary distribution). For what it's worth, I have deliberately created and used such MCMC chains myself in my applied research.

Sometimes stochastic processes with these features are used to model time series data (key word: cointegration). A quick look at this plot might generate some intuition:

The upper figure shows two price time series, which one might think of as nonstationary in due to inflation even though no inflation can be seen on the time scale of the plot. Although each time series taken alone is nonstationary, there can exist a smaller dimensional manifold within the full state space (in this case, the "spread", i.e., the difference of the time series) such that the stochastic process generated by projecting the original process onto the manifold is stationary.

Are there some good references which discuss the use of non-identified latent variables in this way?

I don't know of any references that discuss the use of non-identified latent variables in this exact way, but here are a technical report and a published paper on the subject by Andrew Gelman, and here is a more recent manuscript by a different author that I think might be closer to what you're doing than the previous two references.

Related Question