Let $\mu$ and $\nu$ be two multivariate Gaussian measures on $\mathbb{R}^d$ with non-singular covariance matrices. Can the Fisher-Rao distance $d(\mu,\nu)$ computed on the information manifold of non-generated $d$-dimensional Gaussian measures with Fisher-Rao metric, be bounded by the (symmetrized) KL divergence/relative-entropy between $\mu$ and $\nu$?
Fisher-Rao Distance – Upper Bound Using KL-Divergence
gaussianinformation-geometryit.information-theorypr.probability
Related Solutions
Explicit upper and lower bounds are obtained in Theorem 1.2 and Proposition 2.1 of The total variation distance between high-dimensional Gaussians.
I'm not sure if this is still of interest to you, but I think it is possible to get some reasonable bounds if you are okay with dropping the factor of $\frac{1}{2}$. Here's my work, which can be strengthened and refined.
We start by taking two probability mass functions $p$ and $q$ which we denote as $p_i$ and $q_i$. We define the function $f$ as $f_i= q_i-p_i$. Instead of doing anything fancy, we consider the line segment $p_i(t) = p_i +tf_i$. Since $f$ has total mass zero, the $p_i(t)$ are well defined probability distributions that form a straight line in the probability simplex. We can see that $p_i(0) = p_i$ and $p_i(1) = q_i$
Now we take the Taylor series for the Kullback-Liebler divergence, expanded at $t=0$. This will involve the Fisher metric, but we should expand it out further to get better results.
When we expand out $(p_i + t f_i)\log\left( \frac{p_i + t f_i}{p_i} \right)$, we get the following:
$$f_i t+\frac{f_i^2 t^2}{2 p_i}-\frac{f_i^3 t^3}{6 p_i^2}+\frac{f_i^4 t^4}{12 p_i^3}-\frac{f_i^5 t^5}{20 p_i^4}+\frac{f_i^6 t^6}{30 p_i^5}+O\left(t^7\right)$$
When we sum over $i$, the first term will vanish, and we can factor out a Fisher metric term from all of the others. I will use an integral sign to sum over $i$, as it is suggestive of what should happen in the continuous case.
$$\int f_i t+\frac{f_i^2 t^2}{2 p_i}-\frac{f_i^3 t^3}{6 p_i^2}+\frac{f_i^4 t^4}{12 p_i^3} \ldots \,di = \int \frac{f_i^2 t^2}{ p_i} \left( \frac{1}{2} - \frac{f_i t}{6 p_i} + \frac{f_i^2 t^2}{12 p_i^2} \ldots \right) di $$
We find that the terms in the parenthesis on the right hand side can be simplified. We set $x_i = \frac{f_i t}{p_i}$ and can derive the following:
$$\left( \frac{1}{2} - \frac{x_i}{6} + \frac{x_i}{12} \ldots \right) = \sum_{k=0}^\infty \frac{(-1)^k x_i^k}{(k+1)(k+2)} = \frac{(x_i +1)\log(x_i+1)-x_i}{x_i^2}$$
This should not be surprising; it's very closely related to the original formula for the Kullbeck-Liebler divergence. In fact, we didn't need Taylor series except to know to subtract off the pesky $t f_i$ term. Therefore, we don't need to worry about the convergence, this manipulation is valid without the series. Therefore,
$$KL(p(t), p) = \int \frac{f_i^2 t^2}{ p_i} \left( \frac{( x_i +1)\log(x_i+1)-x_i}{x_i^2} \right) di $$
In order for this to make sense, we need to make sure that $x_i= \frac{f_i t}{p_i} \geq -1$. However, $\frac{f_i}{p_i} = \frac{q_i}{p_i} - 1 \geq -1$. Even better, it turns out that $ \frac{( x_i +1)\log(x_i+1)-x_i}{x_i^2} \leq 1$ on its domain. With this, we are done, because this implies $$KL(q,p)< I_p(f,f).$$
This implies that we can bound the Kullback-Liebler divergence by the Fisher information metric evaluated on a particular vector $f$. Since the KL-divergence can blow up, it's worthwhile to see what happens in this case. Whenever this happens, the tangent vector $f$ at $p$ is large in the Fisher metric.
Best Answer
Since relative entropy behaves locally like a squared distance, we might expect the squared Fisher-Rao metric to be comparable to the symmetrized KL divergence. This is indeed the case.
Let $d_F$ denote the Fisher-Rao metric on the manifold of non-degenerate multivariate Gaussians, and let $D(\mu,\nu):= D_{KL}(\mu\|\nu) + D_{KL}(\nu\|\mu)$ denote the symmetrized KL divergence between measures $\mu,\nu$.
Claim: For multivariate Gaussian measures $\mu_1,\mu_2$ with nonsingular covariance matrices, we have $$ d_F(\mu_1,\mu_2)^2 \leq 2 D(\mu_1,\mu_2). $$
Proof: By the triangle inequality, we have \begin{align*} d_F\big(N(\theta_1,\Sigma_1),N(\theta_2,\Sigma_2)\big)^2 &\leq \big(d_F(N(\theta_1,\Sigma_1),N(\theta_1,\Sigma_2) )+d_F(N(\theta_1,\Sigma_2),N(\theta_2,\Sigma_2) ) \big)^2\\ &\leq 2 d_F\big(N(\theta_1,\Sigma_1),N(\theta_1,\Sigma_2)\big)^2 + 2 d_F\big(N(\theta_1,\Sigma_2),N(\theta_2,\Sigma_2)\big)^2 \end{align*}
On the submanifold of Gaussians with common mean, the (squared) Fisher-Rao distance is equal to $$ d_F\big(N(\theta_1,\Sigma_1),N(\theta_1,\Sigma_2)\big)^2 = \frac{1}{2}\sum_{i}(\log \lambda_i)^2, $$
where $(\lambda_i)$ denote the eigenvalues of the matrix $\Sigma_2^{-1/2}\Sigma_1\Sigma_2^{-1/2}$. On the submanifold of Gaussians with common covariance, the (squared) Fisher-Rao distance is equal to $$ d_F\big(N(\theta_1,\Sigma_2),N(\theta_2,\Sigma_2)\big)^2 = (\theta_1-\theta_2)^T \Sigma_2^{-1} (\theta_1-\theta_2). $$
The symmetrized KL divergence is given by $$ D(N(\theta_1,\Sigma_1),N(\theta_2,\Sigma_2)) = \frac{1}{2}\sum_{i}\left(\lambda_i +\frac{1}{\lambda_i}- 2 \right) +\frac{1}{2} (\theta_1-\theta_2)^T (\Sigma_1^{-1}+\Sigma_2^{-1}) (\theta_1-\theta_2). $$
Now, the claim follows on account of the inequality $$ (\log x)^2 \leq x + \frac{1}{x}-2, ~~~x>0. $$
Remark: The closed-form expressions for the special cases of the Fisher-Rao metric used above can be found in Section 2.1 (and references therein) of:
Pinele, Julianna, João E. Strapasson, and Sueli IR Costa. "The Fisher-Rao distance between multivariate normal distributions: Special cases, bounds and applications." Entropy 22.4 (2020): 404.