What are good ranges for the hyperparameters $\alpha$ and $\beta$ (explained well here) in LDA?
I appreciate hyperparameter tuning always depends on the use case, data, content of documents etc., but is there any general rule or heuristic to choose these hyperparameters for LDA?
Additional Info
For extra info on my particular use case and data (although I'd like a generalizeable answer if possible):
-
29 documents with an average length of 5,177 words (after parsing). This number of documents is expected to grow to between 50-200.
-
3,500 unique words (after parsing and keeping the top 3,500 words by frequency)
-
155,309 total words (again, after parsing)
-
All documents are finance related, and more specifically investment outlook whitepapers. So there isn't a lot of "variety" between documents
This is quite a small dataset, but I think there's enough words and structure in each document to train an LDA model (if not, please let me know).
Best Answer
Choice of $\alpha$ and $\beta$ is indeed tricky, since it impacts the topic modeling results. The Gibbs sampling paper by Griffiths et al. gives some insight into this:
Eventually for scientific documents, the authors chose the following hyper-parameters, $\beta=0.1$ and $\alpha=50/T$. But they had a corpus of around $28K$ documents and a vocabulary of $20K$ words, and they tried several different values of $T: [50, 100, 200, 300, 400, 500, 600, 1000]$.
Regarding your data. I have no experience with analyzing financial text data, but for the choice of $\alpha$ and $\beta$, I would ask myself the following questions:
Answering the above questions may not be straight-forward with limited knowledge of the data. Since you have limited data, I would choose multiple values of $\alpha$ and $\beta$ - ranging from sparse to non-sparse priors - and find which one suits the dataset by computing the perplexity over some hold-out data. To put it more concretely:
Resources: