Since the statistic is bimodal, taking the average of the values for all categories of a product is meaningless.
I don't think this is necessarily true. For instance, breast cancer risk is highly stratified into high vs low risk based on genetic markers. When you don't know what your genetic code is, it still makes sense to report the average.
Creating cuts of the variable has the associated problem with the arbitrary choice of cutoffs. This will cause some bias in the estimation of modes as coming from mixture normal distributions. An alternate approach is that of the EM algorithm where you can simultaneously estimate the "high" versus "low" group assignment in the mixture distribution and calculate CIs for the mean and it's standard error for each group. The details of doing so in R are in this document.
Kurtosis is really pretty simple ... and useful. It is simply a measure of outliers, or tails. It has nothing to do with the peak whatsoever - that definition must be abandoned.
Here is a data set:
0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0, 2, 2, 3, 2, 5, 2, 3, 999
Notice that '999' is an outlier.
Here are the $z^4$ values from the data set:
0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 360.98
Notice that only the outlier gives a $z^4$ that is noticeably different from 0.
The average of these $z^4$ values is the kurtosis of the empirical distribution (subtract 3 if you like, it doesn't matter for the point I am making): 18.05
It should be obvious from this calculation that the data near the "peak" (the non-outlier data) contribute almost nothing to the kurtosis statistic.
Kurtosis is useful as a measure of outliers. Outliers are important to elementary students and therefore kurtosis should be taught. But kurtosis has virtually nothing to do with the peak, whether it is pointy, flat, bimodal or infinite. You can have all the above with small kurtosis and all of the above with large kurtosis. So it should NEVER be presented as having anything to do with the peak, because that will be teaching incorrect information. It also makes the material needless confusing, and seemingly less useful.
Summary:
- kurtosis is useful as a measures of tails (outliers).
- kurtosis has nothing to do with the peak.
- kurtosis is practically useful and should be taught, but only as a measure of outliers. Do not mention peak when teaching kurtosis.
This article explains clearly why the "Peakedness" definition is now officially dead.
Westfall, P.H. (2014). "Kurtosis as Peakedness, 1905 – 2014. R.I.P." The American Statistician, 68(3), 191–195.
Best Answer
There is no good single summary statistic for the type of distribution you have plotted, or, really, for any multimodal distribution.
That is, you can calculate anything you'd like: Mean, median, mode, interquartile range .... whatever. But none of these are good representations of data that has multiple modes.
Even for data that is perfectly normally distributed, you need two numbers: Mean and standard deviation. But let's assume you want a single measure of central tendency or location. For the normal, that's the mean. For highly skewed distributions, the usual choice is the median (although sometimes the mean is best, or even the mode), for distributions with a few extreme outliers, you might consider the trimmed or Winsorized mean.
But for multimodal distributions, none of these really work. The math is fine, but the intuition fails.