Solved – Prediction interval for a fitted log-normal distribution

confidence intervallognormal distributionprediction intervalr

What I am trying to do is to fit a log-normal distribution to a data-set, and then determine confidence and prediction intervals for the fitted distribution – not just for the mean and sd estimates.

My final goal is to be able to say that if we repeated a set of measurements, then 95 % of the values would fall below some specific value. I can obviously do that from the fitted distribution itself (or rather the cumulative version), but I'm thinking that will under-estimate the probability because I need to include the variability in the estimates and/or fit itself first?

If I use the following code:

require(MASS)
set.seed(123)
x<-rlnorm(100)
fit<-fitdistr(x,"lognormal")

then R will calculate a log-normal distribution fitted to my data. The fitdistr function will return the estimated mean and sd (along with standard errors for these estimates).

   meanlog       sdlog   
 0.09040591   0.90824033 
(0.09082403) (0.06422229)

I understand that these will then allow me to plot the fitted distribution (and histogram) using ggplot2 with the following code:

meanlog<-fit$estimate[[1]]
    sdlog<-fit$estimate[[2]]
binwidth<-abs(max(x)-min(x))/20

qplot(x,geom="blank")+geom_histogram(binwidth=binwidth,aes(y= ..density..))+stat_function(fun=dlnorm,arg=list(meanlog=meanlog,sdlog=sdlog),colour="red")

However, what I really want to do is to plot the confidence interval and/or prediction interval of this fitted distribution. Something similar to how ggplot2 does with stat_smooth, something like:

x<-seq(1,100)
y<-x+rnorm(x,sd=10)
qplot(x,y,geom="point")+stat_smooth(method='lm',se=T)

I can use confint(fit) to extract the confidence intervals for the estimated mean and sd, but I think I misunderstand the maths because I can't for the life of me work out how to use those in order to be able to calculate the confidence interval for the actual distribution. So neither can I work out the prediction interval. I've tried writing my own function for the log-normal distribution to input various combinations of the confidence intervals from confint manually – but that doesn't work. Obviously a confidence interval of the estimate does not directly give you the confidence interval of the line. And, therefore, neither the prediction interval.

I would really appreciate anyone who can walk me through this please!

Best Answer

Here is one simple approach:

> x.logmod = lm(log(x) ~ 1)
> exp(predict(x.logmod, newdata = data.frame(junk = 0), interval = "predict"))
       fit       lwr      upr
1 1.094619 0.1773106 6.757576

The linear model obtains the mean of $\log(x)$. The predict statement can compute a prediction interval for a new dataset, so if we un-transform it, we get a prediction interval for $x$ itself. The newdata argument may be skipped if you want 100 copies of the same interval! Instead, I provided a dataset that has just one row; since we are predicting the intercept, it doesn't matter what's in it.

Related Question