In the latent Dirichlet allocation model described in Wikipedia, is $\beta$ the word-topic matrix?
I understand that $\beta$ is the topic-word matrix and that $\beta_{ij}$ contains the probability of word $i$ given topic $j$, but I would like to confirm it.
Best Answer
Confusingly, the variables in Wikipedia's description of smoothed LDA don't follow the paper introducing LDA. In the paper, $\beta$ is first described exactly as you've described it:
The authors later introduce smoothed LDA, in which each row of $\beta$ is drawn from an exchangeable Dirichlet with prior parameter $\eta$. Wikipedia presently uses $\beta$ where the paper uses $\eta$, and $\varphi$ where the paper uses $\beta$.
Here's the plate notation from the paper:
And from wiki:
I'm not sure which is more common in implementations. For example,
scikit-learn
uses $\eta$ for the prior.