Apologies if this is confusing at all, I'm very unfamiliar with geometric means. For context, my data set is 35 month-end portfolio values. I found the month to month growth rate [Month(N)/Month(N-1)] – 1, such that I now have 34 observations and would like to estimate a month end value using the known previous month end's value. For example if I know what the ending value of the portfolio was last month, I would take that multiplied by a growth rate to get an estimate of this month's ending value +/- the margin of error.
I initially used the arithmetic mean of the growth rates, found the sample standard deviation and calculated a confidence interval to get my lower / upper bound growth rates.
I'm now doubting the accuracy of this method and have tried to use geometric mean instead. So currently I have my set of 34 growth rates except I did not subtract 1 so that all values are positive, calculated the geometric mean, and to calculate standard deviation used this wikipedia formula:
$$
\sigma_g = \exp\!\!\left(\sqrt{\frac{\sum_{i=1}^n\ln\!\big(\frac{x_i}{\mu_g}\big)^2}{n}} \right)
$$
I'm now at a loss as to how to calculate a 95% CI as I've looked through similar questions on this site as well as general searching the internet and am seeing different opinions on methods and formulas (I admittedly am also getting a bit lost in the underlying math).
Currently I'm using the formulas for a normal distribution to calculate a confidence interval based off the geometric standard deviation minus 1 (to get it back to a percentage), such that:
- Standard Error = [(Geometric Stdev-1)/Sqrt(N)],
- Margin of Error = [Standard Error * 1.96], and
- CI = [Geometric Mean +/- Margin of Error]
Is this a reasonable approximation or should I be using a different method to calculate the CI?
Best Answer
You can compute the arithmetic mean of the log growth rate:
The basic idea is to take logs and do your standard stuff. Taking logs transforms multiplication into a sum.
$$\bar{r} = \frac{1}{T} \sum_{t=1}^T r_t \quad \quad s_r = \sqrt{\frac{1}{T-1} \sum_{t=1}^T \left( r_t - \bar{r}\right)^2}$$
Then your standard error $\mathit{SE}_{\bar{r}}$ for your sample mean $\bar{r}$ is given by:
$$ \mathit{SE}_{\bar{r}} = \frac{s_r}{\sqrt{T}}$$
The 95 percent confidence interval for $\mu_r = {\operatorname{E}[r_t]}$ would be approximately: $$\left( \bar{r} - 2 \mathit{SE}_{\bar{r}} , \bar{r} + 2 \mathit{SE}_{\bar{r}} \right)$$.
Exponentiate to get confidence interval for $e^{\mu_r}$
Since $e^x$ is a strictly increasing function, a 95 percent confidence interval for $e^{\mu_r}$ would be:
$$\left( e^{\bar{r} - 2 \mathit{SE}_{\bar{r}}} , e^{\bar{r} + 2 \mathit{SE}_{\bar{r}}} \right)$$
And we're done. Why are we done?
Observe $\bar{r} = \frac{1}{T} \sum_t r_t$ is the log of the geometric mean
Hence $e^{\bar{r}}$ is geometric mean of your sample. To show this, observe the geometric mean is given by:
$$ \mathit{GM} = \left(R_1R_2\ldots R_T\right)^\frac{1}{T}$$
Hence if we take the log of both sides:
\begin{align*} \log \mathit{GM} &= \frac{1}{T} \sum_{t=1}^T \log R_t \\ &= \bar{r} \end{align*}
Some example to build intuition:
For $x \approx 1$, we have $\log(x) \approx x - 1$ and for $y \approx 0$, we have $\exp(y) \approx y + 1$. Further away though, those tricks breka down:
If all your log growth rates $r_t$ are near zero (or equivalently $\frac{V_t}{V_{t-1}}$ is near 1, then you'll find that the geometric mean and the arithmetic mean will be quite close
Another answer that might be useful:
As this answer discusses, log differences are basically percent changes.
Comment: it's useful in finance to get comfortable thinking in logs. It's similar to thinking in terms of percent changes but mathematically cleaner.