As far as I can understand you are solving the following problem:
there are two analytical distributions $p(x)$ and $q(x)$, and you want to calculate distance between them, $D(p, q)$.
There are a plenty of measures of distance between two distributions:
I suggest you to try a few from the list above as all of them are rather easy to implement. In most applications numerical experiments is what give you a key to success. Then you can select one that suits you the best (as I haven't found any requirements for this distance in your question I can't suggest you anything else).
As $p(x)$ and $q(x)$ are complex it is almost impossible that there exists analytical expression for some $D(p, q)$, so you will need a numerical way to calculate those distances. Note, that calculation almost all distances involves numerical integration - so they will be rather imprecise if $x$ dimension is high.
This question attempts to explain (1) the information-theoretic interpretation of KL divergence, (2) how such an application lends itself to Bayesian analysis. What follows is directly quoted from pp. 148-150, Section 6.6. of Stone's Information Theory, a very good book which I recommend.
Kullback-Leibler divergence (KL-divergence) is a general measure of the difference between two distributions, and is also known as the relative entropy. Given two distributions $p(X)$ and $q(X)$ of the same variable $X$, the KL-divergence between these distributions is $$ D_{KL}(p(X)||q(X))=\int_x p(x) \log\frac{p(x)}{q(x)}dx \,.$$ KL-divergence is not a true measure of distance because, usually $$D_{KL}(p(X)||q(X)) \not= D_{KL}(q(X)||p(X)) \,.$$ Note that $D_{KL}(p(X)||q(X))>0$, unless $p=q$, in which case it is equal to zero.
The KL-divergence between the joint distribution $p(X,Y)$ and the joint distribution $[p(X)p(Y)]$ obtained from the outer product of the marginal distributions $p(X)$ and $p(Y)$ is $$D_{KL}(p(X,Y)||[p(X)p(Y)])=\int_x \int_y p(x,y) \log\frac{p(x,y)}{p(x)p(y)} dy dx $$ which we can recognize from Equation 6.25 $$I(X,Y) = \int_y\int_x p(x,y)\log\frac{p(x,y)}{p(x)p(y)}dx dy $$ as the mutual information between $X$ and $Y$.
Thus the mutual information between $X$ and $Y$ is the KL-divergence between the joint distribution $p(X,Y)$ and the joint distribution $[p(X)p(Y)]$ obtained by evaluating the outer product of the marginal distributions of $p(X)$ and $p(Y)$.
Bayes' Rule
We can express the KL-divergence between two variables in terms of Bayes' rule (see Stone (2013)$^{52}$ and Appendix F). Given that $p(x,y)=p(x|y)p(y)$, mutual information can be expressed as $$I(X,Y) = \int_y p(y) \int_x p(x|y)\log\frac{p(x|y)}{p(x)}dx dy \,, $$ where the inner integral can be recognized as the KL-divergence between the distributions $p(X|y)$ and $p(X)$, $$D_{KL}(p(X|y)||p(X))=\int_x p(x|y)\log\frac{p(x|y)}{p(x)}dx \,, $$ where $p(X|y)$ is the posterior distribution and $p(X)$ is the prior distribution. Thus, the mutual information between $X$ and $Y$ is $$I(X,Y) = \int_y p(y) D_{KL}(p(X|y)||p(X))dy \,, $$ which is the expected KL-divergence between the posterior and the prior, $$I(X,Y) = \mathbb{E}_y [D_{KL}(p(X|y)||p(X))]\,, $$ where the expectation is taken over values of $Y$.
The application to Bayesian analysis can be found in Appendix H, pp. 157-158 of Stone's also very good book Bayes' Rule.
Reference Priors
The question of what constitutes an un-biased or fair prior has several answers. Here, we provide a brief account of the answer given by Bernardo(1979)$^3$, who called them reference priors.
Reference priors rely on the idea of mutual information. In essence, the mutual information between two variables is a measure of how tightly coupled they are, and can be considered to be a general measure of the correlation between variables. More formally, it is the average amount of Shannon information conveyed about one variable by the other variable. For our purposes, we note that the mutual information $I(x,\theta)$ between $x$ and $\theta$ is also the average difference between the posterior $p(\theta|x)$ and the prior $p(\theta)$, where this difference is measured as the Kullback-Leibler divergence. A reference prior is defined as that particular prior which makes the mutual information between $x$ and $\theta$ as large as possible, and (equivalently) maximizes the average Kullback-Leibler divergence between the posterior and the prior.
What has this to do with fair priors? A defining, and useful, feature of mutual information is that it is immune or invariant to the effects of transformations of variables. For example, if a measurement device adds a constant amount $k$ to each reading, so that we measure $x$ as $y=x+k$, then the mean $\theta$ becomes $\phi=\theta+k$, where $\theta$ and $\phi$ are location parameters. Despite the addition of $k$ to measured values,
the mutual information between $\phi$ and $y$ remains the same as the mutual
information between $\theta$ and $x$; that is, $I(y,\phi) = I(x,\theta)$. Thus, the fairness of
a prior (defined in terms of transformation invariance) is guaranteed
if we choose a common prior for $\theta$ and $\phi$ which ensures that $I(y,\phi) = I(x,\theta)$. Indeed, it is possible to harness this equality to derive
priors which have precisely the desired invariance. It can be shown
that the only prior that satisfies this equality for a location
parameter (such as the mean) is the uniform prior...
Best Answer
Mutual information is not a metric. A metric $d$ satisfies the identity of indisceribles: $d(x, y) = 0$ if and only if $x = y$. This is not true of mutual information, which behaves in the opposite manner--zero mutual information implies that two random variables are independent (as far from identical as you can get). And, if two random variables are identical, they have maximal mutual information (as far from zero as you can get).
You're correct that KL divergence is not a metric. It's not symmetric and doesn't satisfy the triangle inequality.
Mutual information and KL divergence are not equivalent. However, the mutual information $I(X, Y)$ between random variables $X$ and $Y$ is given by the KL divergence between the joint distribution $p_{XY}$ and the product of the marginal distributions $p_X \otimes p_Y$ (what the joint distribution would be if $X$ and $Y$ were independent).
$$I(X, Y) = D_{KL}(p_{XY} \parallel p_X \otimes p_Y)$$
Although mutual information is not itself a metric, there are metrics based on it. For example, the variation of information:
$$VI(X, Y) = H(X, Y) - I(X, Y) = H(X) + H(Y) - 2 I(X, Y)$$
where $H(X)$ and $H(Y)$ are the marginal entropies and $H(X, Y)$ is the joint entropy.