PDF of Gaussian in a hyperplane

linear algebraprobabilityprobability distributions

I'm working in a space, say $\mathbb{R}^n$, where each dimension of $n$ represents the probability of an outcome from a multinomial distribution. In other words, feasible outcomes in the space $\mathbb{R}^n$ lie on the hyperplane defined by $\mathbf{1}^Tx = 1$ or $\sum_{i=0}^n x_i = 1$.

At each iteration I have a point in this space that describes one feasible outcome (a point $x$ which lies on this hyperplane). I want to be able to sample from a Gaussian defined in the hyperplane, with mean $\mu = x$ and some covariance matrix $\Sigma$, so that all of the samples lie on this hyperplane (so they are feasible solutions themselves).

Do I need to sample from an $n$-dimensional Gaussian with $\mu = x$ and then project the sample onto the hyperplane? If so how would I define that projection matrix?

Or is there some way I can define a Gaussian in the hyperplane itself to begin with and sample from it directly, without having to calculate a projection?

Best Answer

I would not say "each of the samples", but rather, "each of the observations in the sample".

Certainly you can define a Gaussian distribution by specifying a suitable expected value (for example $(1,\ldots,1)/n$) satisfying the constraints and a suitable covariance matrix. The rank of the $n\times n$ covariance matrix would be $n-1,$ so the matrix would be singular.

However, it is probably computationally more efficient to choose $(n-1)$ i.i.d. observations from a standard Gaussian distribution and then do a suitable affine transformation to put it in the hyperplane that you want. The affine transformation would determine the expected value and variance of the transformed random vector as follows: \begin{align} & \operatorname E(\mathbf a + M\mathbf X) \\ = {} & \mathbf a + M\operatorname E(\mathbf X) \\[8pt] & \operatorname{var}(\mathbf a + M\mathbf X) \\ = {} & M\Big(\operatorname{var}(\mathbf X) \Big) M^\top, \end{align} where by $\operatorname{var}$ I mean what you called the "covariance matrix": \begin{align} & \operatorname{var}(\mathbf X) = \operatorname E\big( (X-\operatorname E\mathbf X) (X - \operatorname E\mathbf X)^\top\big) \\[6pt] = {} & \text{an $n\times n$ symmetric nonnegative-definite matrix.} \end{align} So you would need to figure out which matrix $M$ should be, and what vector $\mathbb a$ should be, and $\mathbb a$ should lie in the desired hyperplane.

However, I wonder if you want this to remain in the intersection of that hyperplane with the positive orthant, i.e. the region in which all of the coordinates are non-negative. In that case possibly you work with something like $$ \frac{\big(e^{a_1 z_1}, \ldots, e^{a_n z_n}\big)}{ e^{a_1 z_1} + \cdots +e^{a_n z_n} } $$ where $(z_1,\ldots,z_n)$ is some suitably distributed Gaussian random vector.

Or you might consider sampling from a Dirichlet distribution with density \begin{align} & (x_1,\ldots,x_n) \mapsto \text{constant} \times x_1^{\alpha_1-1} \cdots x_n^{\alpha_n-1} \\[10pt] & \text{for } x_1+\cdots + x_n=1,\quad x_1\ge0,\ldots,x_n\ge0. \end{align} The "constant" would be $\dfrac{\Gamma(\alpha_1+\cdots+\alpha_n)}{\Gamma(\alpha_1) \cdots \Gamma(\alpha_n)}.$ I haven't thought about how to do that simulation.