Solved – Lower and Upper confidence limit on estimated arithmetic mean using Land’s exact procedure

confidence intervallognormal distribution

I have to compute the LCL95% and UCL95% using Land's "exact" method. I computed the LCL and UCL for this lognormal distribution using another technique and I can't find anything for Land's exact procedure.

My data set x = {0.043, 0.236, 0.057, 0.016}

Here is what I tried

$y =$ mean of $\ln{x}$

$s^2 =$ standard deviation of $\ln x$.

Confidence limits $ = \exp\left(y + s^2/2 \pm z\sqrt{s^2/n + s^4/2(n-1)}\right)$

and I got UCL: 2.98 and LCL: 0.139
but the answer using Land's exact is UCL95%:11.6 and LCL95%: 0.039

Here is what I have calculated already:

Mean: 0.088
Standard deviation: 0.1
Geometric mean: 0.0552
Geometric standard deviation: 3.04
Estimated arithmetic mean using MVUE: 0.085
95th percentile: 0.343
Upper limit of tolerance: 16.8
mean of $\ln{x} = -2.898$
standard deviation of $\ln{x} = 1.112$

Can anyone please help me sketch out an algorithm for the formula when using Land's exact method?

Best Answer

This report describes the Land Method on page 10.

The values for step 3 in Gilbert's Paper

Related Solutions

Lognormal Distribution – How to Calculate Estimated Arithmetic Mean for a Lognormal Distribution

For positive data $x_1, x_2, \ldots, x_n$ let $y_i = \log(x_i)$ be their natural logarithms. Set

$$\bar{y} = \frac{1}{n}(y_1+y_2+\cdots + y_n)$$

and

$$s^2 = \frac{1}{n-1}\left((y_1 - \bar{y})^2 + \cdots + (y_n - \bar{y})^2\right);$$

these are the mean log and variance of the logs, respectively. The UMVUE for the arithmetic mean when the $x_i$ are assumed to be independent and identically distributed with a common lognormal distribution is given by

$$m(x) = \exp(\bar{y}) g_n\left(\frac{s^2}{2}\right)$$

where $g_n$ is Finney's function

$$g_n(t) = 1 + \frac{(n-1)t}{n} + \frac{(n-1)^3t^2}{2!n^2(n+1)} + \frac{(n-1)^5t^3}{3!n^3(n+1)(n+3)}+\frac{(n-1)^7t^4}{4!n^4(n+1)(n+3)(n+5)} + \cdots.$$

For the data in the question, $s^2 = 1.23594$, $g_4(s^2/2) = 1.532355$, and the UMVUE is $m(x) = 0.084519.$

Because this might take a while to converge when $s^2/2 \gg 1$, it is best implemented as an Excel macro. Such power series are straightforward to program efficiently: just maintain a version of the current term and at each step update it to the next term and add that to a cumulative sum. The term values will typically rise and then fall again; stop when they have fallen below a small positive threshold. (For less floating point error, first compute all such terms and then sum them from smallest to largest in absolute value.)

My version of this macro (in very plain vanilla VBA) follows.

'
' Finney's G (Psi) function as in Millard & Neerchal, formula 5.57
' or equivalently in Gilbert, formula 13.4 (m here = n-1 there).
'
' Typically, m is a positive integer.  Z can be positive or negative.
'
' Programmed by WAH @ QD 5 March 2001
'
' This algorithm will be less accurate for large m*z.  It could be replaced by
' one that separately computes the descending half of the terms,
' iterating backward over i.
'
' It can be badly inaccurate for very negative m*z.
'
' This function returns 0 (an impossible value) upon encountering
' an input error.
'
Public Function Finney(m As Integer, z As Double) As Double
    Dim i As Integer    ' Index variable
    Dim g As Double     ' Result
    Dim x As Double     ' z * m * m / (m+1)
    Dim a As Double     ' Power series coefficient
    Dim iMax As Integer                 ' Maximum iteration count
    Const aTol As Double = 0.0000000001 ' Convergence threshold
    Const iterMax As Integer = 1000     ' Limits execution time

    If (m <= -1) Then
        ' issue an error
        Finney = 0#
    End If

    x = z * m * m / (m + 1)

    If (Abs(x) < aTol) Then
        Finney = 1#     ' This is the correct answer.
        Exit Function
    End If

    iMax = Abs(Int(z) + 1) + 20
    If (iMax > iterMax) Then
        ' issue an error
        Finney = 0#
        Exit Function
    End If
    '
    ' Initialize
    '
    a = 1#
    g = a                       ' Lead terms

    For i = 1 To iMax
        '
        ' Test for convergence
        '
        If (Abs(a) <= aTol * Abs(g)) Then
            Exit For
        End If
        '
        ' Compute the next term
        '
        a = a * x / (m + 2 * (i - 1)) / i
        '
        ' Accumulate terms
        '
        g = g + a
    Next

    Finney = g
End Function

References

Gilbert, Richard O. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold Company, 1987.

Millard, Steven P. and Nagaraj K. Neerchal, Environmental Statistics with S-Plus. CRC Press, 2001.

Appendix

For those using a vectorized implementation it pays to precompute the coefficients of $g_n$ in advance for a given value of $n$. This can also be exploited to determine in advance how many coefficients will be needed, thereby avoiding almost all the comparison operations. Here, as an example, is an R implementation. (It uses the equivalent Gamma-function formula of http://www.unc.edu/~haipeng/publication/lnmean.pdf after correcting a typographical error there: the power series argument should be $(n-1)^2t/(2n)$ rather than $(n-1)t/(2n)$ as written.)

finney <- function(t, n, eps=1.0e-20) {
  u <- t * (n-1)^2 / (2*n)
  tau <- max(u)
  i.max <- ceiling(max(1, -log(eps), 1 + log(tau)/2))
  a=lgamma((n-1)/2) - (lgamma(1:i.max+1) + lgamma((n-1)/2 + 1:i.max))
  b <- exp(a[a + log(tau) * 1:i.max > log(eps)]) # Retain only terms larger than eps
  x <- outer(u, 1:length(b), function(z,i) z^i)  # Compute powers of u
  return(x %*% b + 1)                            # Sum the power series
}

For example, finney(1.2359357/2, 4) produces the value $1.532355$. This implementation can compute a million values per second for $n=3$ and about $400,000$ values per second for $n=300$. As another example of its use, here is a plot of $g_4, g_8, g_{16}, g_{32}$. (The higher graphs correspond to larger values of $n$.)

par(mfrow=c(1,1))
curve(finney(x/2, 32), 0, 2, lwd=2, main="Finney g(t/2)", xlab="t", ylab="")
curve(finney(x/2, 16), add=TRUE, lwd=2, col="#2040c0")
curve(finney(x/2, 8), add=TRUE, lwd=2, col="#c02040")
curve(finney(x/2, 4), add=TRUE, lwd=2, col="#40c020")

Graphs

Solved – Interpret confidence interval upper and lower in linear regression

The 95% in your 95% CI (assuming it is a 95% CI) refers to its long run "coverage" rate of the parameter you're estimating--in this case, regression intercepts and slopes. Kristoffer Magnusson has a nice intuitive visualization of this property of CIs which is often misinterpreted, as @whuber points out above. In essence, if you were to repeatedly draw random samples from the same population of the same size and fit the same model in each, 95% of CIs would contain the population value. If you're looking at Magnusson's visualization, watch the coverage rate over time (i.e., how many randomly sampled CIs from a simulated distribution actually contain the population mean), and you'll see that it begins to approach the level of confidence you specify in the long run.

For example, with your CI for sd_qty, you can't be sure (e.g. you cannot be "95% confident") that the true value of its slope is between -0.034 and .011, but you can be confident that if you were to re-fit this model with 100 random samples of the same size, from the same population, approximately 95/100 of the CIs for sd_qty would contain the population value for that slope.

In terms of interpreting the CI for the purposes of null-hypothesis significance testing, you look to see whether the expected null value is within the CI (for slopes, this expected null value is often [but doesn't have to be] 0); if it is not, you can reject the null hypothesis at the corresponding level of $\alpha$ (e.g., $\alpha$ = .05 for a 95% CI)--your 95% CI for ROC_DSRS_5 does not contain 0, for example, so we could reject the null for this slope. Otherwise, if the expected null value falls within the CI, you fail to reject the null as the interval suggests the estimated value [or one more extreme] would not be all that unusual if the null were true, such as in the case for the 95% CI of sd_qty, which straddles 0.