The KL divergence is a difference of integrals of the form
$$\begin{aligned}
I(a,b,c,d)&=\int_0^{\infty} \log\left(\frac{e^{-x/a}x^{b-1}}{a^b\Gamma(b)}\right) \frac{e^{-x/c}x^{d-1}}{c^d \Gamma(d)}\, \mathrm dx \\
&=-\frac{1}{a}\int_0^\infty \frac{x^d e^{-x/c}}{c^d\Gamma(d)}\, \mathrm dx
- \log(a^b\Gamma(b))\int_0^\infty \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\, \mathrm dx\\
&\quad+ (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\, \mathrm dx\\
&=-\frac{cd}{a}
- \log(a^b\Gamma(b))
+ (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\,\mathrm dx
\end{aligned}$$
We just have to deal with the right hand integral, which is obtained by observing
$$\eqalign{
\frac{\partial}{\partial d}\Gamma(d) =& \frac{\partial}{\partial d}\int_0^{\infty}e^{-x/c}\frac{x^{d-1}}{c^d}\, \mathrm dx\\
=& \frac{\partial}{\partial d} \int_0^\infty e^{-x/c} \frac{(x/c)^{d-1}}{c}\, \mathrm dx\\
=&\int_0^\infty e^{-x/c}\frac{x^{d-1}}{c^d} \log\frac{x}{c} \, \mathrm dx\\
=&\int_0^{\infty}\log(x)e^{-x/c}\frac{x^{d-1}}{c^d}\, \mathrm dx - \log(c)\Gamma(d).
}$$
Whence
$$\frac{b-1}{\Gamma(d)}\int_0^{\infty} \log(x)e^{-x/c}(x/c)^{d-1}\, \mathrm dx = (b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$$
Plugging into the preceding yields
$$I(a,b,c,d)=\frac{-cd}{a} -\log(a^b\Gamma(b))+(b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$$
The KL divergence between $\Gamma(c,d)$ and $\Gamma(a,b)$ equals $I(c,d,c,d) - I(a,b,c,d)$, which is straightforward to assemble.
Implementation Details
Gamma functions grow rapidly, so to avoid overflow don't compute Gamma and take its logarithm: instead use the log-Gamma function that will be found in any statistical computing platform (including Excel, for that matter).
The ratio $\Gamma^\prime(d)/\Gamma(d)$ is the logarithmic derivative of $\Gamma,$ generally called $\psi,$ the digamma function. If it's not available to you, there are relatively simple ways to approximate it, as described in the Wikipedia article.
Here, to illustrate, is a direct R
implementation of the formula in terms of $I$. This does not exploit an opportunity to simplify the result algebraically, which would make it a little more efficient (by eliminating a redundant calculation of $\psi$).
#
# `b` and `d` are Gamma shape parameters and
# `a` and `c` are scale parameters.
# (All, therefore, must be positive.)
#
KL.gamma <- function(a,b,c,d) {
i <- function(a,b,c,d)
- c * d / a - b * log(a) - lgamma(b) + (b-1)*(psigamma(d) + log(c))
i(c,d,c,d) - i(a,b,c,d)
}
print(KL.gamma(1/114186.3, 202, 1/119237.3, 195), digits=12)
Best Answer
The moment generating function $M(t)$ of $Y=\ln X$ is helpful in this case, since it has a simple algebraic form. By the definition of m.g.f., we have $$\begin{aligned}M(t)&=\operatorname{E}[e^{t\ln X}]=\operatorname{E}[X^t]\\ &=\frac{1}{\Gamma(\alpha)\theta^\alpha}\int_0^\infty x^{\alpha+t-1}e^{-x/\theta}\,dx\\ &=\frac{\theta^{t}}{\Gamma(\alpha)}\int_0^\infty y^{\alpha+t-1}e^{-y}\,dy\\ &=\frac{\theta^t\Gamma(\alpha+t)}{\Gamma(\alpha)}.\end{aligned}$$
Let's verify the expectation and the variance you gave. Taking derivatives, we have $$M'(t)=\frac{\Gamma'(\alpha+t)}{\Gamma(\alpha)}\theta^t+\frac{\Gamma(\alpha+t)}{\Gamma(\alpha)}\theta^t\ln(\theta)$$ and $$M''(t)=\frac{\Gamma''(\alpha+t)}{\Gamma(\alpha)}\theta^t+\frac{2\Gamma'(\alpha+t)}{\Gamma(\alpha)}\theta^t\ln(\theta)+\frac{\Gamma(\alpha+t)}{\Gamma(\alpha)}\theta^t\ln^2(\theta).$$ Hence, $$\operatorname{E}[Y]=\psi^{(0)}(\alpha)+\ln(\theta),\qquad\operatorname{E}[Y^2]=\frac{\Gamma''(\alpha)}{\Gamma(\alpha)}+2\psi^{(0)}(\alpha)\ln(\theta)+\ln^2(\theta).$$ It follows then $$\operatorname{Var}(Y)=\operatorname{E}[Y^2]-\operatorname{E}[Y]^2=\frac{\Gamma''(\alpha)}{\Gamma(\alpha)}-\left(\frac{\Gamma'(\alpha)}{\Gamma(\alpha)}\right)^2=\psi^{(1)}(\alpha).$$
To find the skewness, note the cumulant generating function (thanks @probabilityislogic for the tip) is $$K(t)=\ln M(t)=t\ln\theta+\ln\Gamma(\alpha+t)-\ln\Gamma(\alpha).$$ The first cumulant is thus simply $K'(0)=\psi^{(0)}(\alpha)+\ln(\theta)$. Recall that $\psi^{(n)}(x)=d^{n+1}\ln\Gamma(x)/dx^{n+1}$, so the subsequent cumulants are $K^{(n)}(0)=\psi^{(n-1)}(\alpha)$, $n\geq2$. The skewness is therefore $$\frac{\operatorname{E}[(Y-\operatorname{E}[Y])^3]}{\operatorname{Var}(Y)^{3/2}}=\frac{\psi^{(2)}(\alpha)}{[\psi^{(1)}(\alpha)]^{3/2}}.$$
As a side note, this particular distribution appeared to have been thoroughly studied by A. C. Olshen in his Transformations of the Pearson Type III Distribution, Johnson et al.'s Continuous Univariate Distributions also has a small piece about it. Check those out.