Neural Networks – Expected Value Notation in GAN Loss

expected valueganneural networksnotation

I am reading Goodfellow's original paper on GANs.

What I struggle to understand is his notation of the subscript in expected values.

$$
\mathbb{E}_{\boldsymbol{x} \sim p_{data}(\boldsymbol{x})}\ldots
$$

If I understand it correctly then $\boldsymbol{x}$ is a realization of some random variable $\mathbf{x}$, but how can there be an expectation wrt. $\boldsymbol{x}$?

Or is $\boldsymbol{x}$ a random variable?

Thank you.

Edit: I do not think this is duplicate, as the referenced question does not answer what $\boldsymbol{x}$ means.

Best Answer

$E_{x\sim p(x)}[f(X)]$ means the expected value of $f(X)$ if its assumed to be distributed wrt $p(x)$, e.g. for a continuous distribution we have: $$E_{x\sim p(x)}[f(X)]=\int f(x)p(x)dx$$

It's used when the distribution of $x$ subject to change in an optimization problem. Specifically, in the paper, authors have two distributions (in page 5) $p_g$ and $p_{data}$.

Edit: And, the $x$ in the subscript of the expected value notation is not a realization. It's the random variable; or more specifically, in the paper it is the random vector, $\mathbf{x}$ (It's also in bold in Page 5).

Example

See for instance in the example below where we have a beta distributed variable

$$f_X(x) \propto x^2 \quad \qquad \qquad \text{for $\quad 0 \leq x \leq 1$}$$

And we wish to estimate the expectation value for $log(X)$.

Because this is a simple example we know that $E_X[log(X)] = -1/3$ by calculating it analytically. But here we are gonna use self-normalized importance sampling and sampling with another beta distribution $f_Y(y) \propto (1-y)^2$ to illustrate the difference.

In one case we compute it with an exact normalization factor. We can do this because we know $log(Z)$, as for a beta distribution it is not so difficult.

$$E_X[log(X)] \approx \frac{\sum_{\forall y_i} log(y_i) \frac{y_i^2}{(1-y_i)^2}}{1}$$
In the other case we compute it with self-normalization

$$E_X[log(X)] \approx \frac{\sum_{\forall y_i} log(y_i) \frac{y_i^2}{(1-y_i)^2}}{\sum_{\forall y_i} \frac{y_i^2}{(1-y_i)^2}}$$

So the difference is whether this factor in the denominator is a constant based on the partition function $\log(Z)$ (or actually ratio of partition functions for X and Y), or a random variable $\sum_{\forall y_i} {y_i^2}/{(1-y_i)^2}$.

Intuitively you may guess that this latter will increase bias and variance of the estimate.

The image below gives the histograms for estimates with samples of size 100.

ns <- 100
nt <- 10^3

mt <- rep(0,nt)
zt <- rep(0,nt)

for (i in 1:nt) {
  y <- rbeta(ns,1,3)
  t <- log(y)*y^2/(1-y)^2
  z <- y^2/(1-y)^2
  mt[i] <- mean(t)
  zt[i] <- mean(z)
}

h1 <- hist(mt, breaks = seq(-1,0,0.01), main = "using known parition function")
h2 <- hist(mt/zt , breaks = seq(-1,0,0.01), main = "using self-normalization")

Best Answer

Related Solutions

Solved – Intuition for why the (log) partition function matters

Example

Related Question