Solved – Intuition behind gradient of expected value and logarithm of probabilities

I recently came across the following curious identity:

$$\nabla_\theta \mathbb{E}_{x \sim D_\theta}[f(x)]
= \mathbb{E}_{x \sim D_\theta} [ \nabla_\theta \log(D_\theta(x)) f(x)],$$

where $D_\theta$ represents a probability distribution parametrized by $\theta$, $D_\theta(x)$ represents the probability that this distribution assigns to outcome $x$, $\nabla_\theta$ represents the gradient with respect to $\theta$, and $f$ represents some arbitrary function.

I can prove algebraically why this identity holds. However, I lack intuition. Is there any intuition for why this should hold? Perhaps something to understand why it is natural for the logarithm of the probability to appear inside the expected value, or an interpretation of these quantities that makes it natural why the identity would hold?

Here's the algebraic derivation:

$$\nabla_\theta \log(D_\theta(x)) f(x) = {\nabla_\theta D_\theta(x) \over D_\theta(x)} f(x),$$

$$\begin{align*}
\mathbb{E}_{x \sim D_\theta} [ \nabla_\theta \log(D_\theta(x)) f(x)]
&= \sum_x D_\theta(x) \nabla_\theta \log(D_\theta(x)) f(x)\\
&= \sum_x D_\theta(x) {\nabla_\theta D_\theta(x) \over D_\theta(x)} f(x)\\
&= \sum_x \nabla_\theta D_\theta(x) f(x)\\
&= \nabla_\theta \sum_x D_\theta(x) f(x)\\
&= \nabla_\theta \mathbb{E}_{x \sim D_\theta}[f(x)].
\end{align*}$$

Best Answer

It is the outcome of two tricks often used in analysis

the log derivative identity (with $p(x; \theta)$ denoting the density of $D_\theta$) $$ \nabla_\theta \log p (x; \theta) = \frac{\nabla_\theta p (x; \theta)}{p(x; \theta)} $$
exchange the orders of derivative and integral

The logarithm appears just because one wants to write the fraction of gradient $p$ by $p$ in a nice form.

Best Answer

Related Solutions

Solved – Intuition for why the (log) partition function matters

Example

Related Question