Solved – How to optimize hyper-parameters in LDA

hyperparametermachine learningtext miningtopic-models

After reading Hanna Wallach's paper Rethinking LDA: Why Priors Matter, I want to add hyper-parameter optimization to my own implementation of LDA. However, the paper doesn't given any details about how this optimization is to be done. I suppose I could dive into MALLET source code, but I want to understand this to the point that I can implement it rather than just copy code. Does anyone have any pointers to papers or tutorials that can help me?

Edit: Apparently there is a comparison of 5 different methods for optimizing the hyper-parameters in a Dirichlet-multinomial context in Wallach's PhD disertation. I found this on the topic models mailing list at Princeton, but I also missed the reference to the dissertation in the priors paper. I'm going to read through the disertation and then see if I can distill the ideas specifically for LDA in an answer to my own question, but I would be glad to accept someone else's answer if they beat me to it 🙂

Best Answer

"Distributed algorithms for topic models" by Newman, D. and Asuncion, A. and Smyth, P. and Welling, M. gives an auxiliary variable sampling method for hyperparameters. These methods are related to sampling schemes for Hierarchical Dirichlet Process parameters. It doesn't appear that Hannah Wallach includes this method in her dissertation.

Also, "On Smoothing and Inference for Topic Models" by Teh et. al. has an interesting discussion on the role of hyperparameters in LDA.