[Math] Estimating the mean of a truncated gaussian curve

pr.probabilityst.statistics

Say I have a black box generating data samples, and I want to estimate the parameters of the black box from the samples.

The black box works like this: it has a parameter m (a real number), and to generate a value v, it first generates v0 according to a normal distribution (with mean m and variance 1), and if v0 is positive it returns v0, if not it returns 0.

So my data samples will be a bunch of zeroes and positive real numbers.

So my question is, from a sample, how do I estimate m?

And what kind of mathematical tools do I use to reason about this?

To me, this looks like a straightforward case of bayesian probability, where I would use p(samples|m) to get p(m|samples) and have some prior on the distribution of m.

So uncle Bayes would say: $p(m|series) = p(series|m) * p(m) / p(series)$

Since the samples are independant, $p(samples|m) = \prod p(sample|m)$

…but some of those $p(sample|m)$ are "discrete probabilities" (when the value is 0), and some are continuous probabilities! Can I muliply them just like that?

(Same goes for calculating $p(samples)$)

Can someone help me clear the confusion?

Best Answer

OK, let me fully address the question since there is no easy way out. The normal approach is to maximize the "likelihood" of the data under the parameter. The key question here is how to define likelihood for a mixed distribution. Let's use the standard approach as our guide.

Parameter estimation is usually based on the idea that we want to choose parameters that make our data "the most likely." For a discrete probability distribution, we interpret this to mean that our data is the most probable. But this breaks down in the case of continuous probability distributions, where, no matter our choice of parameters, our data has probability zero.

Statisticians thus replace the probability with the probability density for continuous distributions. Here is the justification for this. Instead of actually having a set of numbers drawn from the probability distribution, you have a highly accurate measurement---say, your sequence $\{x_i\}$ for $i = 1,\dots,n$ tells you that the true value of the (still unknown) sequence $\{g_i\}$ satisfies $|x_i - g_i| < \varepsilon$ for all $i$. When $\varepsilon$ is sufficiently small, the replacement $$ \mathbb{P}(|x_i - g_i|) < \varepsilon )\approx \varepsilon p_{g}(x_i) $$ is very accurate, where $p_g$ is the pdf of $g_i$. Assuming that your sequence is iid, we are led to the approximation $$ \mathbb{P}(|x_i - g_i| < \varepsilon \text{ for all } i) \approx \varepsilon^n \prod_{i=1}^n p_g(x_i). $$ We thus choose the pdf from our family which maximizes the right hand side of the above equation, reproducing the standard maximum likelihood method.

Now the question is, what do we do with mixed distributions? When there is a mass at a point $x_i$, that is $\mathbb{P}(x_i=g_i) > 0$, our first approximation is incorrect; for very small $\varepsilon$, we have the approximation $$ \mathbb{P}(|x_i - g_i| < \varepsilon) \approx \mathbb{P}(x_i = g_i) $$ If we let $\mathcal{N}$ be the index set of the "massless" samples, we can approximate the probability of our data as $$ \mathbb{P}(|x_i - g_i| < \varepsilon) \approx \varepsilon^n \prod_{i \in \mathcal{N}} p_g(x_i) \prod_{i \notin \mathcal{N}} \mathbb{P}(x_i = g_i). $$ where $n$ is the number of elements in $\mathcal{N}$. That is, we can reasonably define our maximum likelihood estimate for a parameter $m$ as the value of the parameter that maximizes $$ \prod_{i \in \mathcal{N}} p_g(x_i) \prod_{i \notin \mathcal{N}} \mathbb{P}(x_i = g_i). $$

In your case, it is fairly simple to write down the value of the likelihood function above. First, note that $$\mathbb{P}(x=0) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{-m} e^{-x^2/2}dx.$$ For $x>0$, you have the standard Gaussian pdf $p_g(x) = \tfrac{1}{\sqrt{2\pi}} e^{-(x-m)^2/2}$.

I won't do any more here; suffice it to say that the standard approach to maximizing the likelihood involves taking the logarithm of the likelihood function and setting its derivative to zero. You will probably get a transcendental equation that you will need to solve numerically.

Related Question