Solved – Expected value of a function of a multinomially distributed random variable

expected valuefunctionmultinomial-distribution

I have a scalar function, $g(x)$, where $x$ is an $n$-vector following a multinomial distribution with mass $f(x;p, N)$, for some probability-vector $p$, such that $\sum p_i=1$ and where $\sum x_i = N$.

Now, I'm interested in computing $E[g(x)]$, hence I write
$$
E[g(x)] = \sum_{x_1=0}^N \sum_{x_2=0}^{N-x_1} \cdots \sum_{x_n=N-\sum_{i=1}^n x_i}^{N-\sum_{i=1}^n} g(x)f(x; p, N).
$$

Now, this expression is rather complex, and it appears non-trivial to analyse the expected value, or higher moments, of this function.

So, my quesion is: Does anyone have experience with this kind of computation? Is it better to do Monte Carlo than to work analytically with the expression?

Perhaps you know of some papers or books that discuss this kind of problem?

Best Answer

For the special case where $n=2$ you have the binomial distribution, and the expectation of an arbitrary function of a binomial random variable is a mathematical function called the Bézier curve. This is usually computed using a recursive algorithm called DeCasteljau's algorithm (see this closely related answer). This recursive algorithm is used to prevent computational underflow problems that would occur if you attempted to use direct computation.

For the more general multinomial case, it is possible to extend this algorithm to arbitrary dimensions, though the algorithm becomes quite cumbersome. To obtain a recursive characterisation of the expectation, we take advantage of the well-known recursive equation for the multinomial distribution:

$$\text{Mu}(\mathbf{x}|N, \mathbf{p}) = \sum_{i=1}^n p_i \cdot \text{Mu}(\mathbf{x} - \mathbf{e}_i|N-1, \mathbf{p}).$$

Let $\mathscr{X}_N \equiv \{ \mathbf{x} \in \mathbb{N}_{0+}^n | \sum x_i = N \}$ denote the set of all possible count vectors with sum $N$, and let $\mathbf{e}_i$ denote the $i$th basis vector. Then, using the above recursive equation, for any function $g: \mathscr{X}_N \rightarrow \mathbb{R}$ you have:

$$\begin{equation} \begin{aligned} \mathbb{E}(g(\mathbf{X}) | \mathbf{X} \sim \text{Mu}(N, \mathbf{p}) ) &= \sum_{\mathbf{x} \in \mathscr{X}_N} g(\mathbf{x}) \cdot \text{Mu}(\mathbf{x}|N, \mathbf{p}) \\[6pt] &= \sum_{\mathbf{x} \in \mathscr{X}_N} g(\mathbf{x}) \cdot \sum_{i=1}^n p_i \cdot \text{Mu}(\mathbf{x} - \mathbf{e}_i|N-1, \mathbf{p}) \\[6pt] &= \sum_{i=1}^n p_i \cdot \sum_{\mathbf{x} \in \mathscr{X}_N} g(\mathbf{x}) \cdot \text{Mu}(\mathbf{x} - \mathbf{e}_i|N-1, \mathbf{p}) \\[6pt] &= \sum_{i=1}^n p_i \cdot \sum_{\mathbf{x} \in \mathscr{X}_{N-1}} g(\mathbf{x}+\mathbf{e}_i) \cdot \text{Mu}(\mathbf{x}|N-1, \mathbf{p}) \\[6pt] &= \sum_{i=1}^n p_i \cdot \sum_{\mathbf{x} \in \mathscr{X}_{N-1}} \triangleleft_i g(\mathbf{x}) \cdot \text{Mu}(\mathbf{x}|N-1, \mathbf{p}) \\[6pt] &= \sum_{i=1}^n p_i \cdot \mathbb{E}(\triangleleft_i g(\mathbf{X}) | \mathbf{X} \sim \text{Mu}(N-1, \mathbf{p}) ) \\[6pt] \end{aligned} \end{equation}$$

where $\triangleleft_i g : \mathscr{X}_{N-1} \rightarrow \mathbb{R}$ is a new function defined by $\triangleleft_i g(\mathbf{x}) = g(\mathbf{x}+\mathbf{e}_i)$. This gives you a recursive equation for the expectation, where each iteration of the expectation reduces the size $N$ by one, but requires you to split into $n$ terms. In Decasteljau's algorithm (for the binomial case) you arrange your values in a matrix and compute them recursively from the initial values of the function $g$ for $N=1$ (see the linked answer above for specifics). In the more general multinomial case you can put your values in an $n$ dimensional array and compute the values recursively from the initial values of the function $g$ for $N=1$. This would be done using the above recursive algorithm.

This should give you an algorithm that will compute your expectation exactly, without leading to underflow problems. In the event that the expectations are very small, you can do your computing on a log-scale to prevent underflow. The algorithm is order $\mathcal{O}(N^n)$, so if either of these values are large, then the algorithm might become computationally infeasible. In this latter case you would fall back onto Monte Carlo simulation.

Related Question