I have data for insulate material and fail times and need to create a 95% confidence interval for the mean failure of each material. However the data is log-transformed because the residuals were not normal with the original data. How do I use this data to find intervals for the original data?
Solved – How to create a 95% confidence interval with log-transformed data
confidence intervalrregression
Related Solutions
There are several ways for calculating confidence intervals for the mean of a lognormal distribution. I am going to present two methods: Bootstrap and Profile likelihood. I will also present a discussion on the Jeffreys prior.
Bootstrap
For the MLE
In this case, the MLE of $(\mu,\sigma)$ for a sample $(x_1,...,x_n)$ are
$$\hat\mu= \dfrac{1}{n}\sum_{j=1}^n\log(x_j);\,\,\,\hat\sigma^2=\dfrac{1}{n}\sum_{j=1}^n(\log(x_j)-\hat\mu)^2.$$
Then, the MLE of the mean is $\hat\delta=\exp(\hat\mu+\hat\sigma^2/2)$. By resampling we can obtain a bootstrap sample of $\hat\delta$ and, using this, we can calculate several bootstrap confidence intervals. The following R
codes shows how to obtain these.
rm(list=ls())
library(boot)
set.seed(1)
# Simulated data
data0 = exp(rnorm(100))
# Statistic (MLE)
mle = function(dat){
m = mean(log(dat))
s = mean((log(dat)-m)^2)
return(exp(m+s/2))
}
# Bootstrap
boots.out = boot(data=data0, statistic=function(d, ind){mle(d[ind])}, R = 10000)
plot(density(boots.out$t))
# 4 types of Bootstrap confidence intervals
boot.ci(boots.out, conf = 0.95, type = "all")
For the sample mean
Now, considering the estimator $\tilde{\delta}=\bar{x}$ instead of the MLE. Other type of estimators might be considered as well.
rm(list=ls())
library(boot)
set.seed(1)
# Simulated data
data0 = exp(rnorm(100))
# Statistic (MLE)
samp.mean = function(dat) return(mean(dat))
# Bootstrap
boots.out = boot(data=data0, statistic=function(d, ind){samp.mean(d[ind])}, R = 10000)
plot(density(boots.out$t))
# 4 types of Bootstrap confidence intervals
boot.ci(boots.out, conf = 0.95, type = "all")
Profile likelihood
For the definition of likelihood and profile likelihood functions, see. Using the invariance property of the likelihood we can reparameterise as follows $(\mu,\sigma)\rightarrow(\delta,\sigma)$, where $\delta=\exp(\mu+\sigma^2/2)$ and then calculate numerically the profile likelihood of $\delta$.
$$R_p(\delta)=\dfrac{\sup_{\sigma}{\mathcal L}(\delta,\sigma)}{\sup_{\delta,\sigma}{\mathcal L}(\delta,\sigma)}.$$
This function takes values in $(0,1]$; an interval of level $0.147$ has an approximate confidence of $95\%$. We are going to use this property for constructing a confidence interval for $\delta$. The following R
codes shows how to obtain this interval.
set.seed(1)
# Simulated data
data0 = exp(rnorm(100))
# Log likelihood
ll = function(mu,sigma) return( sum(log(dlnorm(data0,mu,sigma))))
# Profile likelihood
Rp = function(delta){
temp = function(sigma) return( sum(log(dlnorm(data0,log(delta)-0.5*sigma^2,sigma)) ))
max=exp(optimize(temp,c(0.25,1.5),maximum=TRUE)$objective -ll(mean(log(data0)),sqrt(mean((log(data0)-mean(log(data0)))^2))))
return(max)
}
vec = seq(1.2,2.5,0.001)
rvec = lapply(vec,Rp)
plot(vec,rvec,type="l")
# Profile confidence intervals
tr = function(delta) return(Rp(delta)-0.147)
c(uniroot(tr,c(1.2,1.6))$root,uniroot(tr,c(2,2.3))$root)
$\star$ Bayesian
In this section, an alternative algorithm, based on Metropolis-Hastings sampling and the use of the Jeffreys prior, for calculating a credibility interval for $\delta$ is presented.
Recall that the Jeffreys prior for $(\mu,\sigma)$ in a lognormal model is
$$\pi(\mu,\sigma)\propto \sigma^{-2},$$
and that this prior is invariant under reparameterisations. This prior is improper, but the posterior of the parameters is proper if the sample size $n\geq 2$. The following R
code shows how to obtain a 95% credibility interval using this Bayesian model.
library(mcmc)
set.seed(1)
# Simulated data
data0 = exp(rnorm(100))
# Log posterior
lp = function(par){
if(par[2]>0) return( sum(log(dlnorm(data0,par[1],par[2]))) - 2*log(par[2]))
else return(-Inf)
}
# Metropolis-Hastings
NMH = 260000
out = metrop(lp, scale = 0.175, initial = c(0.1,0.8), nbatch = NMH)
#Acceptance rate
out$acc
deltap = exp( out$batch[,1][seq(10000,NMH,25)] + 0.5*(out$batch[,2][seq(10000,NMH,25)])^2 )
plot(density(deltap))
# 95% credibility interval
c(quantile(deltap,0.025),quantile(deltap,0.975))
Note that they are very similar.
There are a few things beyond simple algebra that go into the derivation:
The linear confidence interval you list relies on the fact that the sampling distribution for the estimator $\hat{S}(t)$ is asymptotically normal (so your confidence interval really should be produced from a data set of reasonable size). This, in turn, can be shown by showing that $\hat{S}(t)$ is a maximum likelihood estimator, and it is a general fact that MLEs are asymptotically normally distributed.
Then you must know the invariance property of MLEs, which tells you that $ln(\hat{S}(x))$ is also an MLE. It is therefore asymptotically normally distributed, and its expected value is $ln(S(x))$. Thus,
$$ \frac{\ln (\hat{S}(x))}{\ln (S(x))} $$
is approximately normal with mean 1.
Now, as you say, we take another logarithm and use the invariance property again to determine that
$$ \ln \left( \frac{\ln (\hat{S}(x))}{\ln (S(x))} \right) $$
is approximately normal with mean 0.
The construction of the log-transformed confidence interval now follows by simple algebra if you can determine the variance of the above estimator. This is probably nigh on impossible to do exactly, but once more we can get an approximate value, this time using the delta-method.
Best Answer
In the same way that you would compute and other confidence interval:
I might add that you dont need the residuals (from regression I assume?) to be normal, in order to calculate the confidence band. Assuming you have a large sample.