Probability Estimation – How to Estimate Probability Mass Function from Observed Samples?

binomial distributiondiscrete dataestimationprobability

This question is related to but is distinct from Estimation of probability mass function using finite samples.

As in the related question, suppose we have a discrete random variable $X$ with known and finite outcomes $\{1, \ldots, k\}$, and unknown probability mass function $p(x) \in \{\pi_i$, $i=1 \cdots k\}$. We then sample this distribution $N$ times and observe $(x_1, \ldots, x_k)$ of each outcome, so that $\sum_1^k x_i = N$.

We can certainly take $x_i/N$ as an estimator of $\pi_i$, and if we have no other information about $\pi_i$ it seems a reasonable estimator.

My question is this: is there any way of quantifying the confidence of this estimation of the $\pi_i$, perhaps something akin to a $p$-value? What if we add the assumption that $\pi_i > \epsilon$ for some $\epsilon > 0$? Any nudge in the right direction would be much appreciated.

Please both forgive and inform me of any mistakes in notation or terminology. My background is in pure math and I'm a software engineer by trade, but I'm trying to teach myself some statistics because I find it fascinating.

Best Answer

I'm not sure how a p-value would come into it -- a p-value for a test of what hypothesis?

In any case, it doesn't really tell you about how confident you might be in the estimated value. You might compute a confidence interval for each the $\pi_i$.

If you're prepared to assume a multinomial model, the marginal distribution of counts in cell $i$ will be $\text{binomial}(n,\pi_i)$ ($n$ is the total observed, your $N$), and you can construct approximate binomial proportion confidence intervals in a plethora of ways of varying accuracy and simplicity.

In large samples it usually doesn't matter much which you pick, and then a straight normal approximation to the binomial is often used, and a two tailed $1-\alpha$ CI for $\pi_i$ would then be

$$p_i\pm Z_{1-\alpha/2}\cdot \sqrt{p_i(1-p_i)/n}$$

If $p_i$ is not very small or large and $n$ is large, a useful bound on a 95% CI is $p_i \pm 1/\sqrt{n}$ (it will always contain the interval above, but when the proportion is near 0 or 1 it can be much wider than the 'real' asymptotic interval).

However, the meaning of a confidence interval is a rather subtle concept, one that's frequently misunderstood.

Another possibility is to take a Bayesian approach and calculate a posterior distribution for the $\pi_i$ (from which a different kind of - and possibly more intuitive - interval might be produced, though they may often have very similar limits to a corresponding confidence interval).


Some discussion of a Bayesian interval

The usual Bayesian interval for this sort of thing would be a credible interval (/credible region for vectors of parameters).

I don't actually have a basic reference for you that I like.

I'll do a single parameter, no nuisance parameter case. The mathematics is simple but it may seem a little abstract, so I'll also do an example.

We have a probabilistic model for data ($\mathbf{x}$) with an unknown parameter $\theta$; in a simple case we might say $p(\mathbf{x}|\theta)$ is our model for the data (which might be a density or probability function for the data given the parameter).

The underlying idea is simply an application of Bayes rule:

$p(\theta|\mathbf{x})\propto p(\mathbf{x}|\theta) \cdot p(\theta)$

The second term on the right is called the prior on $\theta$; you take some prior for the parameter, multiply by the likelihood and normalize so it integrates to 1 (which is in simple cases ust a matter of recognizing the density).

In situations where there are nuisance parameters, those are integrated out: $p(\theta|\mathbf{x})=\int p(\theta,\mathbf{\phi}|\mathbf{x})d\mathbf{\phi}$

In "nice" cases this can all be done algebraically. In more complicated cases, we have to use other tools (MCMC is a common way to deal with fairly complicated models, for example, but numerical integration is sometimes used, or asymptotic approximations or any number of other tools).

Once you have your posterior, you can take an interval in it that contains $(1-\alpha)$ of the probability (has a total of $\alpha$ outside the interval). This might be done in any number of ways - e.g. equal tail probability (often used with more or less symmetric cases); the HPD interval (gives shortest intervals when the posterior is unimodal); or it might be done with reference to some loss function.

A common Bayesian approach to a multinomial problem is to assume a Dirichlet prior on the vector of proportions, which with the multinomial likelihood results in a Dirichlet posterior. (It's a common choice because among other things it has a neat property - it's what's called a conjugate prior, in that the posterior is of the same form.)

In the case of concern for the proportion in a single category, that corresponds to a $\text{Beta}(\alpha,\beta)$ prior for $\pi_i$ on a binomial count ($x_i$) in the category of interest.

This is easy enough to develop by hand:

$p(\pi_i|x_i)\propto p(x_i|\pi_i)\cdot p(\pi_i)$

$\qquad\quad\:\:\,\propto {\pi_i}^{x_i} (1-\pi_i)^{n-x_i} \cdot\, \pi_i^{\alpha-1}(1-\pi_i)^{\beta-1}$ (dropping constants of proportionality)

$\qquad\quad\:\:\,\propto \pi_i^{x_i+\alpha-1} (1-\pi_i)^{n-x_i+\beta-1}$ (which we recognize as a $\text{Beta}(x_i+\alpha,n-x_i+\beta)$ posterior)

A probability interval for $\pi_i$ is then obtained by choosing tail cutoffs that put the desired probability inside the interval.

Some more or less relevant links

http://en.wikipedia.org/wiki/Confidence_interval#Credible_interval

What's the difference between a confidence interval and a credible interval?

http://freakonometrics.hypotheses.org/18117

http://www.bayesian-inference.com/credible

Related Question