data-transformation – How to Obtain the Box-Cox Log Likelihood Using the Jacobian in Mathematical Statistics

data transformationlikelihoodmathematical-statistics

Can someone please demonstrate how to get the log-likelihood below from the Box-Cox transformation using the Jacobian? I know that it is meant to be used as I was told in lectures but I can't manipulate it to get the result.

The Box-Cox transformation is defined as:

$$y^{(\lambda)} = \begin{cases}\frac{y^{\lambda}-1}{\lambda}&\text{ when }\lambda \neq 0 \\[5pt]
\text{log(y)}&\text{ when }\lambda = 0\end{cases}$$

The log likelihood is:

$$l(\lambda,\boldsymbol{\beta},\sigma^{2}) = -\frac{n}{2}\log(2\pi\sigma^{2})-\frac{1}{2\sigma^{2}}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})^{T}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})+(\lambda-1)\sum_{i=1}^{n}\log(y_{i}) $$

How do I get from one to the other?

Best Answer

To ensure that the likelihoods for different values of $\lambda$ are compareable, we need the log likelihood based on the untransformed non-Gaussian data $y_1,\dots,y_n$, that is, $$ l(\lambda,\boldsymbol{\beta},\sigma^2)=\ln f(y_1,\dots,y_n).\tag{1} $$ According to the model, the pdf of the transformed data, say $f_{(\lambda)}$, is Gaussian, such that, $$ \ln f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})= -\frac{n}{2}\log(2\pi\sigma^{2})-\frac{1}{2\sigma^{2}}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta})^{T}(\mathbf{y}^{(\lambda)}-\mathbf{X}\boldsymbol{\beta}). \tag{2} $$ The relationship between the two pdfs is $$ f(y_1,\dots,y_n)=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})|\mathbf{J}|,\tag{3} $$ where the diagonal elements of the Jacobian are $\partial y_i^{(\lambda)}/\partial y_i = y_i^{\lambda-1}$ and the off-diagonal elements $\partial y_i^{(\lambda)}/\partial y_j$ are all zero. Hence, the determinant $$|\mathbf{J}|=\prod_{i=1}^n y_i^{\lambda-1}.\tag{4} $$ Combining (1) to (4) leads to the log likelihood function $l$ in your question.

EDIT: Instead of including the additional term from the Jacobian (3) in the log likelihood (e.g. before maximising this with respect to $\lambda$), an alternative is to fit the model to $y_i'=y_i^{(\lambda)}/\tilde y^{\lambda-1}$ where $\tilde y=(\prod_{i=1}^n y_i)^{1/n}$ (the geometric mean of the original $y_i$'s). This has pdf \begin{align} f'(y_1',\dots,y_n')&=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})(\tilde y^{\lambda-1})^n \\&=f_{(\lambda)}(y_1^{(\lambda)},\dots,y_n^{(\lambda)})\prod_{i=1}^n y_i^{\lambda-1} \\&=f(y_1,\dots,y_n). \end{align} The likelihood you obtain from fitting the model to this transformation is thus identical to the likelihood (1) that we seek without the inclusion of any extra correction term.

Related Solutions

Solved – How to get the true mean forecast using the Arima package with a Box-Cox transformation

The approach adopted by the forecast package is to fit a model to the transformed data and to backtransform both the point forecasts and prediction intervals. This is not "wrong". It is a legitimate choice. This preserves the coverage of the prediction intervals, and the back-transformed point forecast can be considered the median of the forecast densities (assuming the forecast densities on the transformed scale are symmetric). For many purposes, this is acceptable, but occasionally the mean forecast is required. For example, with hierarchical forecasting the forecasts need to be aggregated, and medians do not aggregate but means do.

It is easy enough to derive the mean forecast using a Taylor series expansion. Suppose $f(x)$ represents the back-transformation function, $\mu$ is the mean on the transformed scale and $\sigma^2$ is the variance on the transformed scale. Then using the first three terms of a Taylor expansion around $\mu$, the mean on the original scale is given by $$f(\mu) + \frac{1}{2}\sigma^2f''(\mu).$$

For a Box-Cox transformation, $$f(x) = \left\{\begin{array}{ll} (\lambda x+1)^{1/\lambda} & \text{if $\lambda\ne0$;}\\ e^x & \text{if $\lambda=0$.}\end{array}\right. $$ So $$f''(x) = \left\{\begin{array}{ll} (1-\lambda)(\lambda x+1)^{1/\lambda-2} & \text{if $\lambda\ne0$;}\\ e^x & \text{if $\lambda=0$.}\end{array}\right. $$ and the backtransformed mean is given by $$\left\{\begin{array}{ll} (\lambda \mu+1)^{1/\lambda}\left[1 + \frac{\sigma^2(1-\lambda)}{2(\lambda \mu+1)^{2}}\right] & \text{if $\lambda\ne0$;}\\ e^\mu\left[1 + \frac{\sigma^2}{2}\right] & \text{if $\lambda=0$.}\end{array}\right. $$ Therefore, to adjust the back-transformed mean obtained by R, the following code can be used.

library(forecast)

fit <- auto.arima(AirPassengers, lambda=0)
fc <- forecast(fit, h=50, level=95)
fvar <- ((BoxCox(fc$upper,fit$lambda)-BoxCox(fc$lower,fit$lambda))/qnorm(0.975)/2)^2
plot(fc)
fc$mean <- fc$mean * (1 + 0.5*fvar)
lines(fc$mean,col='red')

fit <- auto.arima(AirPassengers, lambda=0.2)
fc <- forecast(fit, h=50, level=95)
fvar <- ((BoxCox(fc$upper,fit$lambda)-BoxCox(fc$lower,fit$lambda))/qnorm(0.975)/2)^2
plot(fc)
fc$mean <- fc$mean * (1 + 0.5*fvar*(1-fit$lambda)/(fc$mean)^(2*fit$lambda))
lines(fc$mean,col='red')

An extended version of this answer is given on my blog.

Solved – Optimize likelihood function to get lambda for Box-Cox transform of two variables

Bantis, L. E., Nakas, C. T. and Reiser, B. (2014), Construction of confidence regions in the ROC space after the estimation of the optimal Youden index‐based cut‐off point. Biom, 70: 212-223. doi:10.1111/biom.12107 implements the method and gives the following formula:

We see that they use the transformed values when calculating the standard deviations (in the square brackets) but the untransformed values in the third row of the formula.

Again, I implemented this in R and this time my only quibble is that $-\frac{n_X}{2}$ and $-\frac{n_Y}{2}$ apparently should be replaced by just $-n_X$ and $-n_Y$, which are in my understanding the numbers of observations in both groups, otherwise the results are way off. The R code thus is:

profile_loglik <- function(lambda, x, y) {
    m <- length(x)
    n <- length(y)
    x_t <- boxcox(x, lambda)
    y_t <- boxcox(y, lambda)
    -m * log(sd(x_t)) - n * log(sd(y_t)) + (lambda - 1) * (sum(log(x)) + sum(log(y)))
}

calc_lambda_hat <- function(x, y) {
    optim_func <- function(lambda) -(profile_loglik(lambda = lambda, x = x, y = y))
    optim(0.5, optim_func, method = "L-BFGS-B", lower = -4, upper = 10)
}

I tested calc_lambda_hat against concatenating both groups and then using forecast::BoxCox.lambda(c(x, y), "loglik"). The correlation of the returned values for $\lambda$ after 1000 repetitions in various scenarios with lognormal distributions is about 85%, so this way the implementation seems to be OK.

Best Answer

Related Solutions

Solved – How to get the true mean forecast using the Arima package with a Box-Cox transformation

Solved – Optimize likelihood function to get lambda for Box-Cox transform of two variables

Related Question