Sample moments typically converge slowly to the true moments. This is the reason why you are observing such discrepancies between the two methods. For instance, run the following code several times
# Simulated data
dat <- rt(2000,df=8)
# Sample kurtosis
kurtosis(dat)-3
# Theoretical kurtosis
6/(8-4)
# MLE kurtosis
LL <- function(par){
if(par>0) return(-sum(dt(dat,df=par,log=T)))
else return(Inf)
}
parameter <-optim(8, fn=LL, method="Brent",lower=6,upper=11)$par
6/(parameter-4)
In many cases the two estimators (sample kurtosis and MLE) differ. You got one of those samples where they differ.
Moreover (and maybe more importantly), the sample kurtosis converges to the true kurtosis while the MLE kurtosis converges to the kurtosis of the distribution that better fits the true distribution according to this criterion.
I agree with @whuber that the fit of your proposal distribution is pretty bad. You are unnecesarilly restricting the distribution (a Student-t would provide a much better fit for almost the same computational cost). Check
library(MASS)
fitdistr(standresidsapewma,"t")
I agree with @NickCox : I think the mistake is in the first line of your post, where you define "bimodality coefficient". I Googled and found Pfister et al (which references SAS/STAT from 1990). That paper indicates problems with BC that are quite similar to the ones you found and recommends Hartigan's dip test, instead of BC (or in addition to it). The dip test is available in R through the diptest
package. In addition, the kurtosis in the formula is supposed to be excess kurtosis and you appear to not have adjusted for that (although I am not certain of this)
The SAS documentation also mentions problems with BC, in particular
Very heavy-tailed distributions have small values of regardless of the
number of modes.
The long tail of your second distribution is probably lowering the value of BC.
In short, the problem is in the formula, not in your code. There is, as far as I know, no perfect measure of the number of modes.
Best Answer
The (excess) kurtosis $\kappa$ of a random variable $X$ is the fourth derivative of its cumulant generating function (cgf) evaluated at zero, divided by the square of the variance (which itself is the second derivative of the cgf, evaluated at zero). By definition, the cgf is
$$\psi_X(t) = \log\mathbb{E}\left(e^{itX}\right).$$
The values of $e^{itX}$ are $1$ when $Xt=0$ and $e^{it}$ when $X=1$; these happen with probabilities $1-p$ and $p,$ respectively. Thus, by the definition of expectation $\mathbb E,$
$$\mathbb{E}\left(e^{itX}\right) = (1-p) + pe^{it} = (1-p)\left(1 + e^u\right)$$
where $$u = it + \log\left(\frac{p}{1-p}\right).$$
Let's work out the derivatives of $g(u)=\log(1-p) + \log(1 + e^u) = \psi_X(t),$ because--up to powers of $i$, which will not matter in the end--these will be the same as its derivatives with respect to $t:$
$$\frac{d}{du}g(u) = \frac{e^u}{1+e^u} = f(u).\tag{1}$$
$$\frac{d^2}{du^2} g(u) = \frac{d}{du}f(u) = \frac{e^u}{(1+e^u)^2} = f(u)(1-f(u)).\tag{2}$$
This makes finding successive derivatives easy: apply the product rule and keep replacing $f^\prime(u)$ by $f(u)(1-f(u)).$ Let's abbreviate $f(u)$ simply as $f$:
$$\frac{d^2}{du^2} f = f(1-f)(1-2f);$$
$$\frac{d^4}{du^4} g(u) =\frac{d^3}{du^3} f = f(1-f)(1-6f(1-f)).\tag{3}$$
When $t=0$, $e^u = (1-p)/p$, implying (by $(1)$) $f(u) = 1-p.$ Consequently, $(2)$ asserts
$$\frac{d^2}{du^2} g\mid_{t=0} = f(1-f) = (1-p)p$$
(recovering the familiar expression for the variance of a Bernoulli$(p)$ variable) and $(3)$ asserts
$$\frac{d^4}{du^4} g\mid_{t=0} = \cdots = p(1-p)(1 - 6p(1-p)).$$
Dividing the latter by the square of the former gives
$$\kappa = \frac{p(1-p)(1 - 6p(1-p))}{(p(1-p))^2} = \frac{1}{p(1-p)} - 6.$$
This is equivalent to the expression given by Wikipedia at https://en.wikipedia.org/wiki/Bernoulli_distribution.
Reference
Alan Stuart & Keith Ord, Kendall's Advanced Theory of Statistics, Fifth Edition, Volume 1. 1986.