I agree with @NickCox : I think the mistake is in the first line of your post, where you define "bimodality coefficient". I Googled and found Pfister et al (which references SAS/STAT from 1990). That paper indicates problems with BC that are quite similar to the ones you found and recommends Hartigan's dip test, instead of BC (or in addition to it). The dip test is available in R through the diptest
package. In addition, the kurtosis in the formula is supposed to be excess kurtosis and you appear to not have adjusted for that (although I am not certain of this)
The SAS documentation also mentions problems with BC, in particular
Very heavy-tailed distributions have small values of regardless of the
number of modes.
The long tail of your second distribution is probably lowering the value of BC.
In short, the problem is in the formula, not in your code. There is, as far as I know, no perfect measure of the number of modes.
The original post misses a couple major points: (1) No "data" can ever be normally distributed. Data are necessarily discrete. The valid question is, "is the process that produced the data a normally distributed process?" But (2) the answer to the second question is always "no", regardless of what any statistical test or other assessment based on data gives you. Normally distributed processes produce data with infinite continuity, perfect symmetry, and precisely specified probabilities within standard deviation ranges (eg 68-95-99.7), none of which are ever precisely true for processes that give rise to the data that we can measure with whatever measurement device we humans can use.
So you can never consider data to be normally distributed, and you can never consider the process that produced the data to be a precisely normally distributed process. But, as Glen_b indicated, it might not matter too much, depending on what it is that you are trying to do with the data.
Skewness and kurtosis statistics can help you assess certain kinds of deviations from normality of your data-generating process. They are highly variable statistics, though. The standard errors given above are not useful because they are only valid under normality, which means they are only useful as a test for normality, an essentially useless exercise. It would be better to use the bootstrap to find se's, although large samples would be needed to get accurate se's.
Also, kurtosis is very easy to interpret, contrary to the above post. It is the average (or expected value) of the Z values, each taken to the fourth power. Large |Z| values are outliers and contribute heavily to kurtosis. Small |Z| values, where the "peak" of the distribution is, give Z^4 values that are tiny and contribute essentially nothing to kurtosis. I proved in my article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321753/ that kurtosis is very well approximated by the average of the Z^4 *I(|Z|>1) values. Hence kurtosis measures the propensity of the data-generating process to produce outliers.
Best Answer
Xi'an's answer proved (or at least hinted a proof) that there are different distributions with the same mean, variance, skewness and kurtosis. I just want to show an example of three visually distinct discrete distributions with the same moments (mean=skewness=0, variance=1 and kurtosis=2):
The code to generate them is: