Solved – Computing Empirical Fisher Information matrix for natural gradient

I would like to implement the natural gradient for reinforcement learning as described in the following paper: https://arxiv.org/pdf/1703.02660.pdf

However, I do not know how to compute the empirical Fisher Information matrix to implement gradient ascent with the following parameter update $\theta_{t+1} := \theta_t + F^{-1}\nabla_\theta J(\pi_\theta)$, where $\nabla_\theta J(\pi_\theta)$ is the regular policy gradient weighted by the advantages.

When computing the empirical Fisher Information as the outer product of the log policy gradient $F = \frac{1}{T}\sum_{t=1}^T\nabla_{\theta}\, log\, \pi_\theta\, \nabla_{\theta}\, log\, \pi_\theta^\intercal$ (summed over all trajectories/samples), the resulting Fisher matrix is not positive semidefinite.

I do not see any reason why the policy gradient $\nabla_{\theta}\, log\, \pi\,$ should not have negative components.

What is the right way to practically calculate the empirical Fisher information from the gradient in an implementation? Is it correct to directly use the outer product of the gradients (e.g. with numpy as F = np.outer(grad, grad)?

Solved – Computing Empirical Fisher Information matrix for natural gradient

Best Answer

Related Question

Best Answer

Related Solutions

Fisher Information – Why Fisher Information Matrix is Positive Semidefinite: Understanding Through Inference and Linear Algebra

Solved – Reinforcement Learning – What is the logic behind actor-critic methods? Why use a critic

Related Question