data-transformation – How to Obtain the Box-Cox Log Likelihood Using the Jacobian in Mathematical Statistics

data transformationlikelihoodmathematical-statistics

Can someone please demonstrate how to get the log-likelihood below from the Box-Cox transformation using the Jacobian? I know that it is meant to be used as I was told in lectures but I can't manipulate it to get the result.

The Box-Cox transformation is defined as:

$$y^{(\lambda)} = \begin{cases}\frac{y^{\lambda}-1}{\lambda}&\text{ when }\lambda \neq 0 \\[5pt]
\text{log(y)}&\text{ when }\lambda = 0\end{cases}$$

The log likelihood is:

$$l(\lambda,\boldsymbol{\beta},\sigma^{2}) = -\frac{n}{2}\log(2\pi\sigma^{2})-\frac{1}{2\sigma^{2}}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})^{T}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})+(\lambda-1)\sum_{i=1}^{n}\log(y_{i}) $$

How do I get from one to the other?

Best Answer

To ensure that the likelihoods for different values of $\lambda$ are compareable, we need the log likelihood based on the untransformed non-Gaussian data $y_1,\dots,y_n$, that is, $$ l(\lambda,\boldsymbol{\beta},\sigma^2)=\ln f(y_1,\dots,y_n).\tag{1} $$ According to the model, the pdf of the transformed data, say $f_{(\lambda)}$, is Gaussian, such that, $$ \ln f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})= -\frac{n}{2}\log(2\pi\sigma^{2})-\frac{1}{2\sigma^{2}}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})^{T}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta}). \tag{2} $$ The relationship between the two pdfs is $$ f(y_1,\dots,y_n)=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})|\mathbf{J}|,\tag{3} $$ where the diagonal elements of the Jacobian are $\partial y_i^{(\lambda)}/\partial y_i = y_i^{\lambda-1}$ and the off-diagonal elements $\partial y_i^{(\lambda)}/\partial y_j$ are all zero. Hence, the determinant $$|\mathbf{J}|=\prod_{i=1}^n y_i^{\lambda-1}.\tag{4} $$ Combining (1) to (4) leads to the log likelihood function $l$ in your question.

EDIT: Instead of including the additional term from the Jacobian (3) in the log likelihood (e.g. before maximising this with respect to $\lambda$), an alternative is to fit the model to $y_i'=y_i^{(\lambda)}/\tilde y^{\lambda-1}$ where $\tilde y=(\prod_{i=1}^n y_i)^{1/n}$ (the geometric mean of the original $y_i$'s). This has pdf \begin{align} f'(y_1',\dots,y_n')&=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})(\tilde y^{\lambda-1})^n \\&=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})\prod_{i=1}^n y_i^{\lambda-1} \\&=f(y_1,\dots,y_n). \end{align} The likelihood you obtain from fitting the model to this transformation is thus identical to the likelihood (1) that we seek without the inclusion of any extra correction term.

Related Question