data-transformation – How to Obtain the Box-Cox Log Likelihood Using the Jacobian in Mathematical Statistics

data transformationlikelihoodmathematical-statistics

Can someone please demonstrate how to get the log-likelihood below from the Box-Cox transformation using the Jacobian? I know that it is meant to be used as I was told in lectures but I can't manipulate it to get the result.

The Box-Cox transformation is defined as:

$$y^{(\lambda)} = \begin{cases}\frac{y^{\lambda}-1}{\lambda}&\text{ when }\lambda \neq 0 \\[5pt]
\text{log(y)}&\text{ when }\lambda = 0\end{cases}$$

The log likelihood is:

$$l(\lambda,\boldsymbol{\beta},\sigma^{2}) = -\frac{n}{2}\log(2\pi\sigma^{2})-\frac{1}{2\sigma^{2}}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})^{T}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})+(\lambda-1)\sum_{i=1}^{n}\log(y_{i}) $$

How do I get from one to the other?

Best Answer

To ensure that the likelihoods for different values of $\lambda$ are compareable, we need the log likelihood based on the untransformed non-Gaussian data $y_1,\dots,y_n$, that is, $$ l(\lambda,\boldsymbol{\beta},\sigma^2)=\ln f(y_1,\dots,y_n).\tag{1} $$ According to the model, the pdf of the transformed data, say $f_{(\lambda)}$, is Gaussian, such that, $$ \ln f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})= -\frac{n}{2}\log(2\pi\sigma^{2})-\frac{1}{2\sigma^{2}}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})^{T}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta}). \tag{2} $$ The relationship between the two pdfs is $$ f(y_1,\dots,y_n)=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})|\mathbf{J}|,\tag{3} $$ where the diagonal elements of the Jacobian are $\partial y_i^{(\lambda)}/\partial y_i = y_i^{\lambda-1}$ and the off-diagonal elements $\partial y_i^{(\lambda)}/\partial y_j$ are all zero. Hence, the determinant $$|\mathbf{J}|=\prod_{i=1}^n y_i^{\lambda-1}.\tag{4} $$ Combining (1) to (4) leads to the log likelihood function $l$ in your question.

EDIT: Instead of including the additional term from the Jacobian (3) in the log likelihood (e.g. before maximising this with respect to $\lambda$), an alternative is to fit the model to $y_i'=y_i^{(\lambda)}/\tilde y^{\lambda-1}$ where $\tilde y=(\prod_{i=1}^n y_i)^{1/n}$ (the geometric mean of the original $y_i$'s). This has pdf \begin{align} f'(y_1',\dots,y_n')&=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})(\tilde y^{\lambda-1})^n \\&=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})\prod_{i=1}^n y_i^{\lambda-1} \\&=f(y_1,\dots,y_n). \end{align} The likelihood you obtain from fitting the model to this transformation is thus identical to the likelihood (1) that we seek without the inclusion of any extra correction term.

