I found this truly excellent review that describes precisely how Hierarchical Dirichlet Processes work.
First, start by choosing a base distribution $H$. In the case of topic modeling, we have a Dirichlet distribution as a prior for $H$. The dimension of this distribution should describe a distribution of words for each topic. Therefore, the dimension should be equal to the size of the vocabulary $V$. In the example described in the review, the author assumes a vocabulary of 10 words, so he uses $H = \text{Dirichlet}(1/10,...,1/10)$. As usual, a realization of this distribution generates a 10-dimensional vector $\theta_{k}$ of proportions.
After this, $H$ is used to build a Dirichlet Process $DP(\gamma, H)$ and a realization $G_{0}$ of this process is another discrete distribution with locations $\{\theta_{k}\}$ where each $\theta_{k}$ describes the distribution over words for a topic $k$. If we use $G_{0}$ as a base distribution for another Dirichlet Process $DP(\alpha_{0}, G_{0})$, it is possible to obtain a realization $G_{j}$ for every document $j$ in such a way that $G_{j}$ has the same support as $G_{0}$. Therefore, every $G_{j}$ shares the same set of $\theta_{k}$'s, although with different proportions (which are called mixing weights in the definition of a Dirichlet Process)
Finally, for every document $j$ and every word $i$, we draw a realization from $G_{j}$ which generates a particular vector $\theta_{k}$. Since this $\theta_{k}$ is a distribution over words for a given topic, we only need to sample from a multinomial distribution using $\theta_{k}$ as parameter in order to sample words $w_{ji}= \text{Multinomial}(\theta_{k})$.
I have seen that sometimes $\phi_{ji}$ is defined as $\phi_{ji}=\theta_{k}$ for every document $j$ and word $i$. Sometimes, it is easier to use a variable $z_{ji}$ that works as an index to sample from the probabilities $\pi_{jk}$ of $G_{j}$ (in $G_{j} = \sum_{k=1}^{\infty} \pi_{jk} \delta_{\theta_{k}}$) and then used as in $\theta_{z_{ji}}$. However, I think this is done in the context of the stick-breaking construction.
Best Answer
You can try getting the logPerplexity per iteration and check on a graph when it converges.