Fisher-Rao Distance – Upper Bound Using KL-Divergence

gaussianinformation-geometryit.information-theorypr.probability

Let $\mu$ and $\nu$ be two multivariate Gaussian measures on $\mathbb{R}^d$ with non-singular covariance matrices. Can the Fisher-Rao distance $d(\mu,\nu)$ computed on the information manifold of non-generated $d$-dimensional Gaussian measures with Fisher-Rao metric, be bounded by the (symmetrized) KL divergence/relative-entropy between $\mu$ and $\nu$?

Best Answer

Since relative entropy behaves locally like a squared distance, we might expect the squared Fisher-Rao metric to be comparable to the symmetrized KL divergence. This is indeed the case.

Let $d_F$ denote the Fisher-Rao metric on the manifold of non-degenerate multivariate Gaussians, and let $D(\mu,\nu):= D_{KL}(\mu\|\nu) + D_{KL}(\nu\|\mu)$ denote the symmetrized KL divergence between measures $\mu,\nu$.

Claim: For multivariate Gaussian measures $\mu_1,\mu_2$ with nonsingular covariance matrices, we have $$ d_F(\mu_1,\mu_2)^2 \leq 2 D(\mu_1,\mu_2). $$

Proof: By the triangle inequality, we have \begin{align*} d_F\big(N(\theta_1,\Sigma_1),N(\theta_2,\Sigma_2)\big)^2 &\leq \big(d_F(N(\theta_1,\Sigma_1),N(\theta_1,\Sigma_2) )+d_F(N(\theta_1,\Sigma_2),N(\theta_2,\Sigma_2) ) \big)^2\\ &\leq 2 d_F\big(N(\theta_1,\Sigma_1),N(\theta_1,\Sigma_2)\big)^2 + 2 d_F\big(N(\theta_1,\Sigma_2),N(\theta_2,\Sigma_2)\big)^2 \end{align*}

On the submanifold of Gaussians with common mean, the (squared) Fisher-Rao distance is equal to $$ d_F\big(N(\theta_1,\Sigma_1),N(\theta_1,\Sigma_2)\big)^2 = \frac{1}{2}\sum_{i}(\log \lambda_i)^2, $$
where $(\lambda_i)$ denote the eigenvalues of the matrix $\Sigma_2^{-1/2}\Sigma_1\Sigma_2^{-1/2}$. On the submanifold of Gaussians with common covariance, the (squared) Fisher-Rao distance is equal to $$ d_F\big(N(\theta_1,\Sigma_2),N(\theta_2,\Sigma_2)\big)^2 = (\theta_1-\theta_2)^T \Sigma_2^{-1} (\theta_1-\theta_2). $$

The symmetrized KL divergence is given by $$ D(N(\theta_1,\Sigma_1),N(\theta_2,\Sigma_2)) = \frac{1}{2}\sum_{i}\left(\lambda_i +\frac{1}{\lambda_i}- 2 \right) +\frac{1}{2} (\theta_1-\theta_2)^T (\Sigma_1^{-1}+\Sigma_2^{-1}) (\theta_1-\theta_2). $$

Now, the claim follows on account of the inequality $$ (\log x)^2 \leq x + \frac{1}{x}-2, ~~~x>0. $$

Remark: The closed-form expressions for the special cases of the Fisher-Rao metric used above can be found in Section 2.1 (and references therein) of:

Pinele, Julianna, João E. Strapasson, and Sueli IR Costa. "The Fisher-Rao distance between multivariate normal distributions: Special cases, bounds and applications." Entropy 22.4 (2020): 404.

Related Question