Solved – Expectation of the softmax transform for Gaussian multivariate variables

Prelims

In the article Sequential updating of conditional probabilities on directed graphical structures by Spiegelhalter and Lauritzen they give an approximation to the expectation of a logistic transformed Gaussian random variable $\theta \sim N(\mu, \sigma^2)$. This uses the Gaussian cdf function $\Phi$ in the approximation

$$ \exp(\theta)/(1 + \exp(\theta)) \approx \Phi(\theta \epsilon) $$

for an appropriately chosen $\epsilon$ (in their case they chose $\epsilon = 0.607$). Hence

$$ \mathbb{E} \left [ \exp(\theta)/(1 + \exp(\theta))\right ] \approx \int_{- \infty}^{\infty} \Phi(\theta \epsilon) \phi(\theta | \mu, \sigma^2) d \theta$$

where $\phi$ is a Gaussian pdf function. The integral can be written as

$$ \int_{\infty}^{\infty} \Pr(U < 0 | \theta) \phi(\theta|\mu, \sigma^2) d\theta $$

where $U \sim N(-\theta, \epsilon^{-2})$ and the integral is then simply the marginal $\Pr(U < 0)$. Note that as $\theta \sim N(\mu, \sigma^2)$, we have $U \sim N(-\mu, \sigma^2 + \epsilon^{-2})$. Hence

$$ \mathbb{E} \left [ \exp(\theta)/(1 + \exp(\theta))\right ] \approx \Pr(U < 0) = \Phi(\frac{\mu}{\sqrt{\sigma^2 + \epsilon^{-2}}})$$

We can then use the initial approximation in the reverse direction to get

$$ \mathbb{E} \left [ \exp(\theta)/(1 + \exp(\theta))\right ] \approx \exp(c \mu)/(1 + \exp(c \mu)) $$

where $c = (1 + \epsilon^2 \sigma^2)^{-1/2}$.

Question

My question is, are there any approximations to the expectation of a softmax transformation of Gaussian multivariate variables. In particular, let

$$ \boldsymbol{Z} \sim MVN(\boldsymbol{\mu}, \Sigma) \in \mathbb{R}^{n} $$

Define the $k$ activations for each discrete outcome as

$$ f_i(\boldsymbol{Z}, \boldsymbol{w}_i) = \boldsymbol{w}_i^T \boldsymbol{Z} $$

Finally define our softmax transformed activations as
$$ P_i(\boldsymbol{Z}) = \frac{\exp(f_i(\boldsymbol{Z}, \boldsymbol{w}_i))}{\sum_{j=1}^k \exp(f_j(\boldsymbol{Z}, \boldsymbol{w}_j))} $$

What I want is an estimate to the expectation
$$ \mathbb{E}[P_i(\boldsymbol{Z})] $$

Note that in the case $k=2$, we have

$$ P_1(\boldsymbol{Z}) = \frac{\exp(f_1(\boldsymbol{Z}, \boldsymbol{w}_1))}{ \exp(f_1(\boldsymbol{Z}, \boldsymbol{w}_1)) + \exp(f_2(\boldsymbol{Z}, \boldsymbol{w}_2))} $$

Therefore

$$ P_1(\boldsymbol{Z}) = \frac{\exp(f_1(\boldsymbol{Z}, \boldsymbol{w}_1) – f_2(\boldsymbol{Z}, \boldsymbol{w}_2))}{ \exp(f_1(\boldsymbol{Z}, \boldsymbol{w}_1)- f_2(\boldsymbol{Z}, \boldsymbol{w}_2)) + 1} $$

and as $f_1(\boldsymbol{Z}, \boldsymbol{w}_1) – f_2(\boldsymbol{Z}, \boldsymbol{w}_2)$ is simply the sum of correlated Gaussian random variables, it is also Gaussian distributed. Hence we can use the initial approximation.

Can we generalise for $k > 2$?

Expectation of Softmax approximation

For computing the average value of a softmax mapping $\pi \left( \mathbf{\mathsf{x}} \right)$ of multi-normal distributed variables $\mathbf{\mathsf{x}} \sim \mathcal{N}_D \left( \mathbf{\mu}, \mathbf{\Sigma} \right)$ the author provides the following approximation:

$$ \mathbb{E} \left[ \pi^k (\mathbf{\mathsf{x}}) \right] \simeq \frac{1}{2 - D + \sum_{k' \neq k} \frac{1}{\mathbb{E} \left[ \sigma \left( x^k - x^{k'} \right) \right]}} $$

Where $x^k$ represents the $k$-component of the $\mathbf{\mathsf{x}}$ D-dimensional vector and $\sigma \left( x \right)$ represent the one-dimensional sigmoidal function. To evaluate this formula one needs to compute the average value $\mathbb{E} \left[ \sigma (x) \right]$ for which you could use your own approximation (a very similar approximation is again provided in the aformentioned article).

This formula is based on a re-writing of the softmax formula in terms of sigmoids and starts from the $D=2$ case you mentioned where the result is "exact" (as much as an approximation can be) and postulate the validity of their expression for $D>2$. They validate their proposal by means of numerical validation.

Solved – Expectation of the softmax transform for Gaussian multivariate variables

Best Answer

Expectation of Softmax approximation

Related Question

Best Answer

Expectation of Softmax approximation

Related Solutions

K-Means Clustering – How to Derive K-means Algorithm as a Limit of Expectation Maximization for Gaussian Mixtures

Solved – Approximating the mathematical expectation of the argmax of a Gaussian random vector

Related Question