Solved – How to determine the number of iterations for Latent Dirichlet Allocation

hyperparametertopic-models

I am performing Latent Dirichlet Allocation for 240 test documents (trained model with 3361 documents). I am using 150 iterations and 120 burn in iterations. Is there a specific way to determine iterations for the LDA process? I am computing for topics 1000, 1500,… 4500 (every 500th topic).

Best Answer

A common way to determine the number of iterations is to compute perplexity as defined in D. Blei's original LDA paper. Perplexity describes how well the model fits the data by computing word likelihoods averaged over the test documents. When the difference in perplexity is smaller than a threshold, we can declare convergence and stop iterating.

For more advanced methods of evaluating LDA performance and some code, you can refer to the paper by Wallach et al.

There are many methods for estimating topics in LDA: variational, Gibbs sampling, EM. Since you mentioned burn-in, you are probably using the collapsed Gibbs sampler for inferring the topic distributions. In that case, you can use empirical MCMC convergence diagnostics such as Estimated Potential Scale Reduction.

Related Question