[Math] Kernel density estimation for heavy-tailed distributions using the champernowne transformation

probabilityprobability distributionsprobability theorystatisticstransformation

I am trying to follow this paper to estimate the density for a heavy-tailed distributions using the champernowne transformation.

Alternative link to the paper

Another alternative link to the paper

However, I do not understand the final step to transform the kernel density estimate of the transformed data back to the untransformed data set.

An outline of the procedure is below:

Firstly, the data, X, is transformed:

enter image description here

Where T() is a modified Champernowne CDF. The parameter alpha, M and c have already been estimated.

Then a Kernel Density Estimate, with a Gaussian kernel is done on the transformed data. However, the data must lie in the interval (0,1), so we only take the that part of the estimated density and then divide by the integral of that part of the density.

enter image description here

The final step, which I don't understand is the formula below. What does the denominator mean?

I understand that the numerator is the estimate of the transformed data set.

I can also see the transformered data set in the denominator, T(), but what is T'?

enter image description here

The authors of the paper then write the following expression for the density estimator of the untransformed dataset:

enter image description here

Best Answer

There is a mistake in the fourth formula, the one you are trying to understand (that is apparent from the last formula where that mistake disappears). Precisely, I mean it should be written $$\hat{f} \left( x \right) = \frac{\hat{f}_{\text{trans}} \left( T_{\hat{\alpha}, \hat{M}, \hat{c}} \left( x \right) \right)}{\left| \left( T^{- 1}_{\hat{\alpha}, \hat{M}, \hat{c}} \right)' \left( x \right) \right|}$$ and not $$\hat{f} \left( x \right) = \frac{\hat{f}_{\text{}} \left( T_{\hat{\alpha}, \hat{M}, \hat{c}} \left( x \right) \right)}{\left| \left( T^{- 1}_{\hat{\alpha}, \hat{M}, \hat{c}} \right)' \left( T_{\hat{\alpha}, \hat{M}, \hat{c}} \left( x \right) \right) \right|}$$

The notation in these formulas are clumsy and not very intuitive, but I will explain how that formula is derived and where the mistake occurs.

The relation between the two random variables $Y$ and $X$ is given by $Y = T \left( X \right)$ (I will denote by $T$ the function $T_{\hat{\alpha}, \hat{M}, \hat{c}}$ to simplify notation). The transformation $T$ is the cumulative distribution function of an absolutely continuous random variable and thus is strictly montonically increasing with unique inverse $T^{- 1}$. Let $t \left( x \right) = T' \left( x \right) = \frac{\partial T \left( x \right)}{\partial x}$ be the density corresponding to $T$. Denote the by $f_X$ and $f_Y$ the densities of $X$ and $Y$. The relation between the two densities is $$ f_X \left( x \right) = f_Y \left( T \left( x \right) \right) t \left( \left. x \right) \right. $$ $$ f_Y \left( y \right) = f_X \left( T^{- 1} \left( y \right) \right) \frac{1}{t \left( T^{- 1} \left( y \right) \right)} $$ This is clear because the jacobian of the transformation $X = T^{- 1} \left( y \right)$ is $t \left( x \right)$. (and not $t \left( T \left( x \right) \right)$ as is assumed in the fourth formula). The term $\left| \frac{1}{\left( T^{- 1} \right)' T \left( x \right)} \right|$ appears mistakenly in the fourth formula because $$ t \left( T \left( x \right) \right) = \left| t \left( T \left( x \right) \right) \right| = \left| T' \left( T \left( x \right) \right) \right| = \left| \frac{1}{\left( T^{- 1} \right)' T \left( x \right)} \right| $$ (Notice that the derivative of the inverse function is the inverse of the derivative of the original function. Also there is no need for the absolute values here because densities are positive. To quickly check the error notice that $T:(0,\infty)\rightarrow (0,1)$ and $t:(0,\infty)\rightarrow (0,\infty)$). In the last formula, the last term is $t \left( x \right)$ and not the incorrect one $t \left( T \left( x \right) \right)$.

Related Solutions

[Math] Multivariate normal distribution conditional on two random variables

Ok. $\Sigma$ is the $3\times 3$ covariance matrix of the joint distribution of your three variables (just look at the article). It is then unequally partitioned into sub-matrices. Denoting $v_{ij}$ the elements of $\Sigma$, we have, following the notation of the article,

$$\Sigma_{11} = v_{11}=\sigma^2_1\;,\; \Sigma_{12} = [v_{12}\;\; v_{13}]\;,\; \Sigma_{21} = \left[ \begin{matrix} v_{21}\\ \\ v_{13}\end{matrix} \right]$$ $$\Sigma_{22} =\left[\begin{matrix} v_{22} &v_{23} \\ v_{32} & v_{33} \end{matrix}\right] = \left[\begin{matrix} \sigma^2_{2} &v_{23} \\ v_{32} & \sigma^2_{3} \end{matrix}\right]$$

Then the conditional expectation function $E(X_1\mid X_2,X_3)$ is

$$E(X_1\mid X_2,X_3) = \mu_1 + \Sigma_{12}\Sigma^{-1}_{22}\left [\begin{matrix} X_2-\mu_2 \\ \\X_3-\mu_3\end{matrix} \right] $$

I guess now you can work the term for the conditional variance.

Bayesian Estimation basics. Density and estimation methods

Let's start with a very simple example of Bayesian inference that includes some of the issues you raise. Then you may have a framework for follow-up questions and raising additional issues.

A political consultant is hired to advise one candidate in an upcoming election. From prior experience with other elections and some knowledge of the candidate, the consultant has the prior distribution $\mathsf{Beta}(330, 270)$ for the probability $\theta$ that the candidate will win. That is, the consultant thinks the probability the candidate will win is roughly 0.55 and likely between 0.51 and 0.59. Computation in R:

330/(330+270)
[1] 0.55       # mean of BETA(330, 270)
qbeta(c(.025, .975), 330, 270)
[1] 0.5100824 0.5896018

The prior distribution has density proportional to $p(\theta) \propto \theta^{330-1}(1-\theta)^{270-1}.$

Choosing the prior distribution is often at least partially a matter of opinion. The consultant might have been just as happy with another similar beta distribution as her prior.

Results of a public opinion poll by a reputable pollster show that $x = 620$ out of $n = 1000$ randomly chosen likely voters favor the candidate. Thus the binomial likelihood is proportional to $L(x|\theta) = \theta^{620}(1-\theta)^{1000-620}.$

Then by Bayes' Theorem, the posterior distribution is proportional to $$\theta^{330-1}(1-\theta)^{270-1}\times\theta^x(1-\theta)^{1000-620} \propto g(\theta|x) \\ \propto \theta^{330 + 620 - 1}(1-\theta)^{270 + 1000 -620-1}\\ \propto \theta^{950-1}(1-\theta)^{650-1},$$ where we recognize the last term as proportional to the density function of $\mathsf{Beta}(950, 650).$ Information in this posterior distribution is a melding of information in the prior distribution and in the data.

In this case, it is easy to find the posterior distribution because the binomial likelihood is 'conjugate to' (mathematically compatible with) the beta density of the prior distribution.

A 95% Bayesian probability interval $(.670, 618)$ for $\theta$ can be found by cutting 2.5% of the probability from each tail of the posterior distribution. Possible point estimates are the mean, median, or mode (in this case, all about 0.594).

qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932

Here is a plot of the prior and posterior distributions. The 95% posterior probability interval is shown by dashed lines.

So data from the poll together with the prior distribution show a slightly more favorable standing of the candidate than did the prior distribution.

Notes: (1) If the prior distribution in this example had been the 'noninformative' Jeffrey's prior $\mathsf{Beta}(.5,.5),$ then the 95% Bayesian posterior interval would have been nearly the same (numerically) as a frequentist 95% confidence interval (but Bayesians and frequentists interpret interval estimates somewhat differently).

(2) A conjugate prior distribution for a Poisson likelihood function is a gamma distribution. Normal likelihood functions are conjugate to normal likelihood functions.

(3) Reference. Suess & Trumbo (2010), Springer. The example shown above is similar to one found in Chapter 8 of this book.

Best Answer

Related Solutions

[Math] Multivariate normal distribution conditional on two random variables

Bayesian Estimation basics. Density and estimation methods

Related Question