Solved – Maximum likelihood estimators for a truncated distribution

distributionsestimationmathematical-statisticsmaximum likelihoodtruncation

Consider $N$ independent samples $S$ obtained from a random variable $X$ that is assumed to follow a truncated distribution (e.g. a truncated normal distribution) of known (finite) minimum and maximum values $a$ and $b$ but of unknown parameters $\mu$ and $\sigma^2$. If $X$ followed a non-truncated distribution, the maximum likelihood estimators $\widehat\mu$ and $\widehat\sigma^2$ for $\mu$ and $\sigma^2$ from $S$ would be the sample mean $\widehat\mu = \frac{1}{N} \sum_i S_i$ and the sample variance $\widehat\sigma^2 = \frac{1}{N} \sum_i (S_i – \widehat\mu)^2$. However, for a truncated distribution, the sample variance defined in this way is bounded by $(b-a)^2$ so it is not always a consistent estimator: for $\sigma^2 > (b-a)^2$, it cannot converge in probability to $\sigma^2$ as $N$ goes to infinity. So it seems that $\widehat\mu$ and $\widehat\sigma^2$ are not the maximum-likelihood estimators of $\mu$ and $\sigma^2$ for a truncated distribution. Of course, this is to be expected since the $\mu$ and $\sigma^2$ parameters of a truncated normal distribution aren't its mean and variance.

So, what are the maximum likelihood estimators of the $\mu$ and $\sigma$ parameters of a truncated distribution of known minimum and maximum values?

Best Answer

Consider any location-scale family determined by a "standard" distribution $F$,

$$\Omega_F = \left\{F_{(\mu, \sigma)}: x \to F\left(\frac{x-\mu}{\sigma}\right) \mid \sigma \gt 0\right\}.$$

Assuming $F$ differentiable we readily find that the PDFs are $\frac{1}{\sigma}f\left((x-\mu)/\sigma\right)dx$.

Truncating these distributions to restrict their support between $a$ and $b$, $a \lt b$, means that the PDFs are replaced by

$$f_{(\mu, \sigma; a,b)}(x) = \frac{f\left(\frac{x-\mu}{\sigma}\right)dx}{\sigma C(\mu, \sigma, a, b)}, a \le x \le b$$

(and are zero for all other values of $x$) where $C(\mu, \sigma, a, b) = F_{(\mu,\sigma)}(b) - F_{(\mu,\sigma)}(a)$ is the normalizing factor needed to ensure that $f_{(\mu, \sigma; a, b)}$ integrates to unity. (Note that $C$ is identically $1$ in the absence of truncation.) The log likelihood for iid data $x_i$ therefore is

$$\Lambda(\mu, \sigma) = \sum_i \left[\log{f\left(\frac{x_i-\mu}{\sigma}\right)} - \log{\sigma}-\log{C(\mu, \sigma, a, b)}\right].$$

Critical points (including any global minima) are found where either $\sigma=0$ (a special case I will ignore here) or the gradient vanishes. Using subscripts to denote derivatives, we may formally compute the gradient and write the likelihood equations as

$$\eqalign{ 0 &= \frac{\partial\Lambda}{\partial\mu} &= \sum_i \left[\frac{-f_\mu\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)} -\frac{C_\mu(\mu,\sigma,a,b)}{C(\mu,\sigma,a,b)}\right] \\ 0 &= \frac{\partial\Lambda}{\partial\sigma} &= \sum_i \left[\frac{-f_\sigma\left(\frac{x_i-\mu}{\sigma}\right)}{\sigma^2f\left(\frac{x_i-\mu}{\sigma}\right)} -\frac{1}{\sigma}-\frac{C_\sigma(\mu,\sigma,a,b)}{C(\mu,\sigma,a,b)}\right] }$$

Because $a$ and $b$ are fixed, drop them from the notation and write $nC_\mu(\mu, \sigma, a, b)/C(\mu, \sigma,a,b)$ as $A(\mu,\sigma)$ and $nC_\sigma(\mu, \sigma, a, b)/C(\mu, \sigma,a,b)$ as $B(\mu, \sigma)$. (With no truncation, both functions would be identically zero.) Separating the terms involving the data from the rest gives

$$\eqalign{ -A(\mu,\sigma) &= \sum_i \frac{f_\mu\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)} \\ -\sigma^2 B(\mu,\sigma) - n\sigma &= \sum_i \frac{f_\sigma\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)} }$$

By comparing these to the no-truncation situation it is evident that

  • Any sufficient statistics for the original problem are sufficient for the truncated problem (because the right hand sides have not changed).

  • Our ability to find closed-form solutions relies on the tractability of $A$ and $B$. If these do not involve $\mu$ and $\sigma$ in simple ways, we cannot hope to obtain closed-form solutions in general.

For the case of a normal family, $C(\mu,\sigma,a,b)$ of course is given by the cumulative normal PDF, which is a difference of error functions: there is no chance that a closed-form solution can be obtained in general. However, there are only two sufficient statistics (the sample mean and variance will do) and the CDF is as smooth as can be, so numerical solutions will be relatively easy to obtain.

Related Question