Solved – How to estimate theta for the inverse hyperbolic sine transformation

data transformationr

I would like help with R code to estimate theta for the Inverse Hyperbolic Sine Transformation. This transformation is useful to transform skewed data that contain negative values or zeros.

There are a couple of related posts that discuss the IHS and suggest there is a maximum likelihood approach to estimating theta but I can't work out how to apply it. These related posts are:

Please see example code below. I've made skewed data by using the inverse of the IHS. Now, given the skewed data, and no prior knowledge of theta, how can I work out what theta should be? I would be most grateful for R code to undertake this analysis.

# Define the IHS transformation and its inverse
IHS <- function(x, theta){  # Inverse IHS transformation
  (1/theta)*asinh(theta * x)
}

Inv.IHS <- function(x, theta){  # IHS transformation
  (1/theta)*sinh(theta * x)
}

set.seed(1)
# generate some normal data
x <- rnorm(1000)
hist(x, breaks='FD')

# skew it by applying the Inverse of the IHS transformation
xt <- Inv.IHS(x, theta=2)
hist(xt, breaks='FD') # yep this is skewed.  How could we estimate theta?

usng inverse of the IHS to skew data

Best Answer

After a couple more days of thinking about the problem, I have two tentative answers.

  1. Select theta so that the transformed data is close to normal as measured by goodness of fit. For example choose theta to maximize the p-value of the Shapiro-Wik test.

    set.seed(1)
    x <- rnorm(1000)
    xt <- Inv.IHS(x, theta=2)
    
    Shapiro.test.pvalue <- function(theta, x){ 
      x <- IHS(x, theta) 
      shapiro.test(x)$p.value 
    }
    
    optimise(Shapiro.test.pvalue, lower=0.001, upper=50, x=xt, maximum=TRUE)  # 2.069838
    
  2. Maximum likelihood estimation of theta.

    Looking at the paper by Burbidge et al. I think the likelihood function for a single variable can be expressed as follows.

    IHS.loglik <- function(theta,x){
    
      IHS <- function(x, theta){  # function to IHS transform
      asinh(theta * x)/theta
    }
    
    n <- length(x)
    xt <- IHS(x, theta)
    
    log.lik <- -n*log(sum((xt - mean(xt))^2))- sum(log(1+theta^2*x^2))
    return(log.lik)
    }     
    
    # try this on our data
    optimise(IHS.loglik, lower=0.001, upper=50, x=xt, maximum=TRUE) # 2.0407
    

In both cases we get close to the expected theta, which is encouraging. But when I try the maximum likelihood approach on my real data it doesn't seem to give reasonable answers.

Related Question