Solved – Back-transformed confidence intervals

back-transformationconfidence intervaldata transformation

Having come across this discussion I'm raising the question on the back-transformed confidence intervals conventions.

According to this article the nominal coverage back-transformed CI for the mean of a log-normal random variable is:

$\ UCL(X)= \exp\left(Y+\frac{\text{var}(Y)}{2}+z\sqrt{\frac{\text{var}(Y)}{n}+\frac{\text{var}(Y)^2}{2(n-1)}}\right)$
$\ LCL(X)= \exp\left(Y+\frac{\text{var}(Y)}{2}-z\sqrt{\frac{\text{var}(Y)}{n}+\frac{\text{var}(Y)^2}{2(n-1)}}\right)$

/and not the naive $\exp((Y)+z\sqrt{\text{var}(Y)})$/

Now, what are such CIs for the following transformations:

  1. $\sqrt{x}$ and $x^{1/3}$
  2. $\text{arcsin}(\sqrt{x})$
  3. $\log(\frac{x}{1-x})$
  4. $1/x$

How about the tolerance interval for the random variable itself (I mean a single sample value randomly drawn from the population)? Is there the same issue with the back-transformed intervals, or will they have the nominal coverage?

Best Answer

Why are you doing back transformations at all? That's critical to answering your question because in some cases the naive transform is the right answer. In fact, I think I'll argue that, if the naive back transform isn't the right answer then you shouldn't back transform at all.

I find the general issue of back transformation highly problematic and often filled with muddled thinking. Looking at the article you cited, what makes them think that it's a reasonable question that the back transformed CI doesn't capture the original mean? It's a mistaken interpretation of back transformed values. They think that the coverage should be for direct analysis in the back transformed space. And then they create a back transform to fix that mistake instead of their interpretation.

If you do your analyses on log values then your estimates and inferences apply to those log values. As long as you consider any back transform a representation of how that log analysis looks in the exponential space, and only as that, then you're fine with the naive approach. In fact, it's accurate. That's true of any transform.

Doing what they're doing solves the problem of trying to make the CI into something that it's not, a CI of the transformed values. This is fraught with problems. Consider the bind you're in now, the two possible CI's, one in transformed space where you do your analyses, and one back transformed, make very different statements about where the likely mu is in the other space. The recommended back transform creates more problems than it solves.

The best thing to take out of that paper is that when you decide to transform the data it has deeper impacts than expected on the meaning of your estimates and inferences.

Related Question