Gaussian Mixture Distribution – Fitting Truncated Normal Mixtures in R: A Complete Guide

finite-mixture-modelgaussian mixture distributionmixture-distributiontruncated normal distributiontruncated-distributions

I have a vector x, lower_bound < x < upper_bound. I would like to fit a truncated normal mixture distribution to x. I can use the package mixtools to fit a normal mixture:

library(mixtools)
mix_fit <- normalmixEM(x)

but that does not account for the upper and lower bounds. Is there any package in R that fits truncated normal mixtures? Otherwise, I guess I'd have to implement my own EM. So if no packages have this functionality, I'd welcome any good references on implementation details for that.

Best Answer

A direct approach to estimating a mixture of two $(a,b)$ truncated Normal distributions $$f(x;\boldsymbol{\theta})=\varpi_1 \varphi(x;\mu_1,\sigma_1,a,b)+(1-\varpi_1)\varphi(x;\mu_0,\sigma_0,a,b)$$ where $$\varphi(x;\mu_1,\sigma_1,a,b)=\dfrac{\exp\{-(x-\mu_1)^2/2\sigma_1^2\}}{\sqrt{2\pi}\sigma_1[\Phi(\{b-\mu_1\}/\sigma_1)-\Phi(\{a-\mu_1\}/\sigma_1)]}$$ is to use the complete likelihood $$\prod_{i=1}^n [\varpi_1 \varphi(x_i;\mu_1,\sigma_1,a,b)]^{z_i}[(1-\varpi_1)\varphi(x_i;\mu_0,\sigma_0,a,b)]^{1-z_i}$$ with E target $$\sum_{i=1}^n \mathbb E[Z_i|x_i,\boldsymbol{\theta}^-]\log [\varpi_1 \varphi(x_i;\mu_1,\sigma_1,a,b)]+\\\sum_{i=1}^n \mathbb E[1-Z_i|x_i,\boldsymbol{\theta}^-] \log [(1-\varpi_1) \varphi(x_i;\mu_0,\sigma_0,a,b)] $$ where $$\mathbb E[Z_i|x_i,\boldsymbol{\theta}]=\dfrac{ \varpi_1 \varphi(x_i;\mu_1,\sigma_1,a,b)}{f(x_i;\boldsymbol{\theta})}$$ which involves in the M step $$\varpi_1^+ = \frac{1}{n}\sum_{i=1}^n \mathbb E[Z_i|x_i,\boldsymbol{\theta}]$$ and $$(\mu_1^+,\mu_0^+,\sigma_1^+,\sigma_0^+) = \arg\max\sum_{i=1}^n \mathbb E[Z_i|x_i,\boldsymbol{\theta}^-]\log \varphi(x_i;\mu_1,\sigma_1,a,b)+\\\sum_{i=1}^n \mathbb E[1-Z_i|x_i,\boldsymbol{\theta}^-] \log \varphi(x_i;\mu_0,\sigma_0,a,b) $$ Unfortunately, this optimisation is not feasible in an analytical form.

A potentially interesting alternative is to add to the observed sample $(x_1,\ldots,x_n)$ a latent sample $$(Y_1,\ldots,Y_{N_1},W_1,\ldots,W_{N_2})$$ such that

  1. the $Y_i$'s are from the mixture truncated to $(-\infty,a)$
  2. the $W_j$'s are from the mixture truncated to $(b,\infty)$
  3. $N_1\sim\mathcal B(n/\{F(b;\boldsymbol{\theta})-F(a;\boldsymbol{\theta})\},F(a;\boldsymbol{\theta}))$
  4. $N_2\sim\mathcal B(n/\{F(b;\boldsymbol{\theta})-F(a;\boldsymbol{\theta})\},1-F(b;\boldsymbol{\theta}))$
Related Question