Solved – Metropolis-Hastings steps for estimating mixing weights of Gaussian mixtures

bayesiandirichlet distributionmarkov-chain-montecarlo

So I'm trying out a toy problem of inferring the mixing weights of a K-component Gaussian mixture model (just the weights, so I'm assuming the parameters of each Gaussian is known). My posterior is $\pi({\bf p}|\{x_i\})\propto[\prod_{i=1}^{n}\sum_{k=1}^{K}\pi_k(x_i)p_k][\prod_{k=1}^K p_k^{\alpha_k-1}]$, where $\pi_k$ is a Gaussian density, and there's a Dirichlet prior on the mixing weights.

At first I tried M-H using a Dirichlet proposal distribution with parameters equal to the previous sample of $\bf p$ scaled by some factor (i.e. ${\bf p_{prop}}\sim Dir({\bf p_{curr}}\times scaling\ factor) $), but basically none of the samples would be accepted unless my prior for $\bf p$ was close to the true distribution (e.g. if the true weights were 0.5, 0.25, 0.25 and my prior was $Dir(50,25,25)$). Does anyone have any insight into why a Dirichlet proposal doesn't work?

I then found these slides (https://www.ceremade.dauphine.fr/~xian/BCS/Bmix.pdf) which say to reparameterize $\bf p$ by a vector $\bf w$ where $p_i=\frac{w_i}{\sum_{j=1}^K w_j}$, so that you can sample the $\bf w$ without the constraint of them adding to 1. I then used the following M-H step:

-propose $w_1'\sim Norm(w_1,\sigma^2)$ (automatically reject the sample if $w_1<0$)

-test the current sample of $\bf w$ with $w_1$ replaced by $w_1'$, if accepted replace $w_1$ with $w_1'$ in the current sample

$\vdots$

-propose $w_k'\sim Norm(w_k,\sigma^2)$ (automatically reject the sample if $w_k<0$)

-test the current sample of $\bf w$ with $w_k$ replaced by $w_k'$, if accepted replace $w_k$ with $w_k'$ in the current sample

-store $\bf w$ as the sample for the current time step

Is this a valid way to sample $\bf w$, or should I be only testing out the proposed sample after sampling every component (i.e. propose ${\bf w'}\sim Norm({\bf w},\sigma^2 I)$ and then test that sample)? Or should I be doing something else altogether?

Best Answer

At first I tried M-H using a Dirichlet proposal distribution with parameters equal to the previous sample of p scaled by some factor (i.e. $p^\text{prop}\sim\text{Dir}(p^\text{curr}×\alpha)$ where $\alpha$ is a scaling factor), but none of the samples would be accepted unless my prior for p was close to the true distribution. Does anyone have any insight into why a Dirichlet proposal doesn't work?

This should work when adapting the scaling factor $\alpha$ to be rather small and possibly adding a stabilising term $\delta$ as in $p^\text{prop}\sim\text{Dir}(p^\text{curr}×\alpha+\delta)$. And of course using the right Metropolis-Hastings acceptance ratio, since the proposal is not a random walk.

Is this a valid way to sample w, or should I be only testing out the proposed sample after sampling every component (i.e. propose $w′∼Norm(w,σ^2I)$ and then test that sample)? Or should I be doing something else altogether?

Simulating the whole vector $\mathbf{w}$ at once or one component of $\mathbf{w}$ at a time are both valid approaches (provided one uses the proper Metropolis-Hastings acceptance ratio, obviously). Acceptance rates should be higher for the unidimensional case but convergence slower.

Related Question