"Distributed algorithms for topic models" by Newman, D. and Asuncion, A. and Smyth, P. and Welling, M. gives an auxiliary variable sampling method for hyperparameters. These methods are related to sampling schemes for Hierarchical Dirichlet Process parameters. It doesn't appear that Hannah Wallach includes this method in her dissertation.
Also, "On Smoothing and Inference for Topic Models" by Teh et. al. has an interesting discussion on the role of hyperparameters in LDA.
Can LDA be used to detect the topic of A SINGLE document?
Yes, in its particular representation of 'topic,' and given a training corpus of (usually related) documents.
LDA represents topics as distributions over words, and documents as distributions over topics. That is, one very purpose of LDA is to arrive at probabilistic representation of each document as a set of topics. For example, the LDA implementation in gensim
can return this representation for any given document.
But this depends on the other documents in the corpus: Any given document will have a different representation if analyzed as part of a different corpus.
That's not typically considered a shortcoming: Most applications of LDA focus on related documents. The paper introducing LDA applies it to two corpora, one of Associated Press articles and one of scientific article abstracts. Edwin Chen's nicely approachable blog post applies LDA to a tranche of emails from Sarah Palin's time as Alaska governor.
If your application demands separating documents into known, mutually exclusive classes, then LDA-derived topics can be used as features for classification. Indeed, the initial paper does just that with the AP corpus, with good results.
Relatedly, Chen's demonstration doesn't sort documents into exclusive classes, but his documents' mostly concentrate their probability on single LDA topics. As David Blei explains in this video lecture, the Dirichlet priors can be chosen to favor sparsity. More simply, "a document is penalized for using many topics," as his slides put it. This seems the closest LDA can get to a single, unsupervised topic, but certainly doesn't guarantee every document will be represented as such.
Best Answer
I think it is not really a question of better and worse but what data you have available and interpretability. If the data you have is at least partially labeled, whether with something like traditional topic classes, or something like hashtags, then labeled LDA may be interesting to pursue, otherwise not. Doing labeled LDA is certain to mean that the induced classes correspond well with human categorization, as represented by the provided label space. This is often very useful for applications with a human in the loop. It's not the same thing as maximizing data likelihood, though.
But, in summary, if you've got some labelings, and human interpretability of the clusters is important to you, then Labeled LDA will likely produce better results.