Solved – How to estimate theta for the inverse hyperbolic sine transformation

I would like help with R code to estimate theta for the Inverse Hyperbolic Sine Transformation. This transformation is useful to transform skewed data that contain negative values or zeros.

There are a couple of related posts that discuss the IHS and suggest there is a maximum likelihood approach to estimating theta but I can't work out how to apply it. These related posts are:

Please see example code below. I've made skewed data by using the inverse of the IHS. Now, given the skewed data, and no prior knowledge of theta, how can I work out what theta should be? I would be most grateful for R code to undertake this analysis.

# Define the IHS transformation and its inverse
IHS <- function(x, theta){  # Inverse IHS transformation
  (1/theta)*asinh(theta * x)
}

Inv.IHS <- function(x, theta){  # IHS transformation
  (1/theta)*sinh(theta * x)
}

set.seed(1)
# generate some normal data
x <- rnorm(1000)
hist(x, breaks='FD')

# skew it by applying the Inverse of the IHS transformation
xt <- Inv.IHS(x, theta=2)
hist(xt, breaks='FD') # yep this is skewed.  How could we estimate theta?

usng inverse of the IHS to skew data

set.seed(1) x <- rnorm(1000) xt <- Inv.IHS(x, theta=2) Shapiro.test.pvalue <- function(theta, x){ x <- IHS(x, theta) shapiro.test(x)$p.value } optimise(Shapiro.test.pvalue, lower=0.001, upper=50, x=xt, maximum=TRUE) # 2.069838

IHS.loglik <- function(theta,x){ IHS <- function(x, theta){ # function to IHS transform asinh(theta * x)/theta } n <- length(x) xt <- IHS(x, theta) log.lik <- -n*log(sum((xt - mean(xt))^2))- sum(log(1+theta^2*x^2)) return(log.lik) } # try this on our data optimise(IHS.loglik, lower=0.001, upper=50, x=xt, maximum=TRUE) # 2.0407

Best Answer

After a couple more days of thinking about the problem, I have two tentative answers.

Select theta so that the transformed data is close to normal as measured by goodness of fit. For example choose theta to maximize the p-value of the Shapiro-Wik test.

Maximum likelihood estimation of theta.

Looking at the paper by Burbidge et al. I think the likelihood function for a single variable can be expressed as follows.

In both cases we get close to the expected theta, which is encouraging. But when I try the maximum likelihood approach on my real data it doesn't seem to give reasonable answers.

Best Answer

Related Solutions

Related Question