The difficulty is that skewness and kurtosis are dependent; their effects can't be completely separated.
The problem is that if you want to examine the effect of a highly skew distribution, you must also have a distribution with high kurtosis.
In particular, kurtosis* $\geq$ skewness$^2+1$.
* (ordinary scaled fourth moment kurtosis, not excess kurtosis)
Khan and Rayner (which is mentioned in the earlier answer) work with a family that allows some exploration of the impact of skewness and kurtosis, but they cannot avoid this issue, so their attempt to separate them severely limits the extent to which the effect of skewness can be explored.
If one holds the kurtosis ($\beta_2$) constant, one cannot make the skewness more than $\sqrt{\beta_2-1}$. If one wishes to consider unimodal distributions, the skewness is even more restricted.
For example, if you want to see the effect of high skewness - say skewness > 5, you cannot get a distribution with kurtosis less than 26!
So if you want to investigate the impact of high skewness, you are unable to avoid investigating the impact of high kurtosis. Consequently if you do try to separate them, you in effect hold yourself unable to assess the effect of increasing skewness to high levels.
That said, at least for the distribution family they considered, and within the limits that the relationship between them poses, the investigation by Khan and Rayner does seem to suggest that kurtosis is the main problem.
However, even if the conclusion is completely general, if you happen to have a distribution with (say) skewness 5, it's likely to be little comfort to say "but it's not the skewness that's the problem!" -- once your skewness is $>\sqrt{2}$, you can't get a kurtosis to be that of the normal, and beyond that, minimum possible kurtosis grows rapidly with increasing skewness.
I agree with @NickCox : I think the mistake is in the first line of your post, where you define "bimodality coefficient". I Googled and found Pfister et al (which references SAS/STAT from 1990). That paper indicates problems with BC that are quite similar to the ones you found and recommends Hartigan's dip test, instead of BC (or in addition to it). The dip test is available in R through the diptest
package. In addition, the kurtosis in the formula is supposed to be excess kurtosis and you appear to not have adjusted for that (although I am not certain of this)
The SAS documentation also mentions problems with BC, in particular
Very heavy-tailed distributions have small values of regardless of the
number of modes.
The long tail of your second distribution is probably lowering the value of BC.
In short, the problem is in the formula, not in your code. There is, as far as I know, no perfect measure of the number of modes.
Best Answer
Pearson plot
Let:
$$\beta _1=\frac{\mu _3^2}{\mu _2^3} \quad \text{and} \quad \beta _2=\frac{\mu _4}{\mu _2^2}$$
where $\sqrt{\beta_1}$ is often used as a measure of skewness, and $\beta_2$ is often used as a measure of kurtosis.
The Pearson plot diagram characterises distributions that are members of the Pearson family in terms of skewness ($x$-axis) and kurtosis ($y$-axis):
The point at (0,3) denotes the Normal distribution.
A Gamma distribution defines the green Type III line. The Exponential distribution is just a point on the Type III line.
An Inverse Gamma distribution defines the blue Type V line.
A Power Function [Beta(a,1)] defines the Type I(J) line. Type I nests the Beta distribution.
Type VII (symmetrical: when $\beta_1 = 0$) nests Student's t distribution etc
Adding other distributions to the Pearson diagram
The OP asks how to add other distributions (that may not even be members of the Pearson family) onto the Pearson plot diagram. This is something done as an exercise in Chapter 5 of our Springer text: Rose and Smith, Mathematical Statistics with Mathematica. A free copy of the chapter can be downloaded here:
http://www.mathstatica.com/book/bookcontents.html
To illustrate, suppose we want to add a skew-Normal distribution onto the Pearson plot diagram. Let $X \sim \text{skewNormal}(\lambda)$ with pdf $f(x)$:
The mean is:
... while the second, third and fourth central moments are:
Then, $\beta_1$ and $\beta_2$ are given by:
Since $\beta_1$ and $\beta_2$ are both determined by parameter $\lambda$, it follows that $\beta_1$ and $\beta_2$ are related. Eliminating parameter $\lambda$ looks tricky, so we shall use numerical methods instead. For instance, we can plot $\beta_1$ and $\beta_2$ parametrically, as a function of $\lambda$, as $\lambda$ increases from 0 to say 300:
Where will this line be located on a Pearson diagram? To see the answer exactly, we need to superimpose plot P1 onto a standard Pearson diagram. Since a Pearson plot has its vertical axis inverted, we still need to invert the vertical axis of plot P1. This can be done by converting all points {x,y} in P1 into {x,9-y}, and then showing P1 and the Pearson plot together. Doing so yields:
In summary, the black line in the diagram depicts the possible values of ($\beta_1, \beta_2$) that a $\text{skew-Normal}(\lambda)$ distribution can exhibit. We start out at (0,3) i.e. the Normal distribution when $\lambda=0$, and then move along the black line as $\lambda$ increases towards infinity.
The same method can conceptually be adopted for any distribution: in some cases, the distribution will be captured by a single point; in others, such as the case here, as a line; and in others, something more general.
Notes
Expect
andPearsonPlot
are functions from the mathStatica add-on to Mathematica;Erf
denotes the error function, andParametricPlot
is a Mathematica function.In the limit, as $\lambda$ increases towards infinity, at the upper extremum, $\beta_1$ and $\beta_2$ take the following values, here expressed numerically: {0.990566, 3.86918}