[Math] Confidence Interval and Variance of Coefficient of Variation

confidence intervalstatisticsvariance

i'm currently struggling to find the confidence interval of a statistic.

I'm calculating the coefficient of variation for a specific sample. The coefficient of variation is

$$\frac{\sigma}{\mu}$$

I would like to construct a normal Confidence interval around the estimator.

I'm stuck when i'm trying to get the variance of my estimator.

$$ VAR[\frac{\sigma}{\mu}] = VAR[\frac{\sum(x_i-\bar{x})^2}{\sum(x_i)}] $$

I start expanding it and a lot of it resolves by itself.

However, I'm very confused about the following quantities:

$$ VAR[\frac{\sum(x_i)^2}{\sum(x_i)}] $$ and $$ COV[\frac{\sum(x_i)^2}{\sum(x_i)},X] $$

can anyone lend me a hand here !
Thank you very much!

Best Answer

Estimating CVs. The coefficient of variation (CV) $\kappa = \sigma/\mu.$ It can be estimated by $\hat \kappa = K = S/\bar X,$ where $\bar X$ and $S$ are the sample mean and SD, respectively. For small $n,$ this estimate is biased on the low side, but for moderate and large samples the bias is small. Methods of finding confidence intervals (CIs) for the CV depend on the nature of the underlying distribution.

Because the type of population distribution may be unknown, it may be useful to use a nonparametric bootstrap CI for the $\kappa.$ Because the population may be skewed (especially right-skewed) in practice, the bootstrap must anticipate skewness.

Because I found the literature on CIs for the CV to be partly hidden behind dollar barriers, and partly poorly explained, I'm wondering if bootstrap CIs may be the best solution for your application. I gave two examples of bootstrap CIs below, one using a sample from a normal population and one using a sample from a gamma population. At least, you can compare these results with results from formulas you may find in your Internet searches.

Bootstrap CIs. If we knew the distribution of $V = K - \kappa,$ we could find bounds $L$ and $U$ cutting 2.5% from its lower and upper tails, respectively to get $P(L < K - \kappa < U) = 0.95,$ from which we would obtain the 95% CI $(K - U, K - L)$ for $\kappa.$

Not knowing the distribution of $V,$ we re-sample from our data $X = (X_1, X_2, \dots, X_n).$ Iteratively we find re-samples of size $n$ with replacement from $X,$ find $K^* = S^*/\bar X^*$ and then $V* = K^* - \kappa^*$ for each re-sample, where the observed CV $K_{obs}$ from the original sample $X$ is used for $\kappa^*.$ Finally, we get $L^*$ and $U^*$ by cutting 2.5% from each tail of the $V^*$'s, the 'bootstrapped' values of $V$, and use these estimated bounds to get the a 95% bootstrap CI.

Examples of Bootstrap CIs. As a demonstration, I use a sample $X$ if $n = 100$ from $\mathsf{Norm}(\mu = 200, \sigma=25)$ with $\kappa = 0.125.$ In the outline above of the bootstrap procedure, $*$'s represented quantities based on re-sampling. In the R program below we use .re for the same purpose.

Note: It is important to understand that re-sampling does not create additional information. Re-sampling exploits information in existing data to do statistical analysis.

Normal. For the particular normal sample we used $K_{obs} = 0.118$, and the 95% nonparametric bootstrap CI obtained is $(0.102, 0.135).$ Because bootstrap procedures involve random re-sampling, each run of the program may give a slightly different CI, but not much different with as many as $B = 10^5 = 100,000$ iterations.

x = rnorm(100,  200, 25)
k.obs = sd(x)/mean(x);  k.obs
## 0.1180088
B = 10^5;  v.re = numeric(B)
for(i in 1:B) {
  x.re = sample(x, 100, repl=T)
  k.re = sd(x.re)/mean(x.re)
  v.re[i] = k.re - k.obs }
UL = quantile(v.re, c(.975,.025))
k.obs - UL
##     97.5%      2.5% 
## 0.1018754 0.1350186

Gamma. This bootstrap procedure is called 'nonparametric' because it does not assume any particular type of distribution for the data. A second sample of size $n = 100$ was taken from the distribution $\mathsf{Gamma}(shape=\alpha = 4, rate=\lambda=.1)$ with $\kappa = \sqrt{\alpha}/\alpha = 1/2.$ This sample has $K = 0.507$ and the 95% nonparametric bootstrap CI is $(0.442, 0.579).$ A second run of the bootstrap program with the same data gave the CI $(0.442, 0.580).$

Related Solutions

[Math] Confidence interval; exponential distribution (normal or student approximation?)

I do not recommend to use the Student approximation. Instead, it is better to observe that $X_1 + \dots + X_n \simeq \Gamma(n,\theta)$ with $\theta = 1/\lambda$. Therefore, $\frac{2}{\lambda}(X_1 + \dots + X_n) \simeq \Gamma(n,\frac12)=\chi^2_{2n}$. Hence you can construct the required interval through the quantiles of $\chi^2_{2n}$.

[Math] Constructing a confidence interval for population variance

First, let's get the notation and definitions right; The sample mean $\bar X = \frac 1n\sum_{i=1}^n X_i.$ If the population mean $\mu$ is unknown and estimated by $\bar X,$ then the population variance $\sigma^2$ is estimated by the sample variance $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2.$ Then $$\frac{(n-1)S^2}{\sigma^2} = \frac{\sum_{i-1}^n(X_i - \bar X)^2}{\sigma^2} \sim \mathsf{Chisq}(df = n-1).$$

For your dataset the statistics are:

x = c(22.2, 24.7, 20.9, 26.0, 27.0, 24.8, 26.5, 23.8, 25.6, 23.9)
n = length(x);  a = mean(x);  s = sd(x)
n;  a;  s
## 10           # sample size
## 24.54        # sample mean
## 1.912648     # sample SD

Then 95% confidence interval for the population variance $\sigma^2$ is obtained as $$((n-1)S^2/U,\, (n-1)S^2/L),$$ where $L$ and $U$ cut 2.5% of the probability from the lower and upper tails, respectively, of $\mathsf{Chisq(n-1)}.$ Computations of CIs for $\sigma^2$ and $\sigma$ in R statistical software follow:

UL = qchisq(c(.975, .025), n - 1);  UL
##  19.022768  2.700389
CI = (n-1)*s^2 / UL;  CI
##  1.730768 12.192315   95% CI for pop var
sqrt(CI)
##  1.315587 3.491750    95% CI for pop SD

Notice that $S = 1.913$ is contained in the CI for $\sigma$ as it must be, but that $S$ is not at the center of the CI, because the chi-squared distribution is skewed.

I assume you can use the appropriate quantiles of $\mathsf{Chisq}(9)$ to get 99% confidence intervals.

Addendum per Comments for 99% CIs: Of course, 99% confidence intervals have to be longer than 95% CIs.

 UL = qchisq(c(.995, .005), n - 1);  UL
 ##  23.589351  1.734933  # same as you showed in your question
 CI = (n-1)*s^2 / UL;  CI
 ##  1.395715 18.977103   # using correct numerator, this is different
 sqrt(CI)
 ## 1.181404 4.356272

Best Answer

Related Solutions

[Math] Confidence interval; exponential distribution (normal or student approximation?)

[Math] Constructing a confidence interval for population variance

Related Question