Solved – Scott’s and Freedman–Diaconis rules of the thumb for selecting bin width – disatvantages

binninghistogram

Scott's and Freedman–Diaconis rules of the thumb are based on the following formula:

$binWidth = constant \times \frac{1}{\sqrt[3]{n}}$

In the article here it is stated that:

While these appear to be useful estimates for unimodal densities
similar to a Gaussian distribution, they are known to be
suboptimal for multimodal densities.

My question is, will Scott and Freedman–Diaconis rules of the thumb estimate the correct number of bins on distributions with more than one peak?

What are the disadvantages of these methods, and how to overcome them?

Best Answer

Comment continued. Here is a mixture of three normal samples (each of size 50) with means sufficiently far apart, relative to their standard deviations, to show separate modes. The default binning in R provides a histogram that does find the modes. The default KDE in R (with the default bandwidth) roughly matches the three modes (at 12, 18, and 25).

set.seed(930)
x = cbind(rnorm(50,12,2), rnorm(50,18,2), rnorm(50,25,2))
hist(x, prob=T, col="skyblue2"); rug(x)
lines(density(x), col="red", lwd=2)

Related Solutions

Solved – Rules of Thumb to choose an initial number of class intervals and refine that choice (potentially automatically)

The help of the R command hist http://stat.ethz.ch/R-manual/R-patched/library/grDevices/html/nclass.html has some references to algorithms for computing the number of the bins:

Sturges, H. A. (1926) The choice of a class interval. Journal of the American Statistical Association 21, 65–66.

Scott, D. W. (1979) On optimal and data-based histograms. Biometrika 66, 605–610.

Freedman, D. and Diaconis, P. (1981) On the histogram as a density estimator: L_2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 453–476.

Solved – Rationale for the use of Regressogram (Bin-Smooth)

So we're saying "Y is related to X", but we don't know the form. We want to estimate $E(Y|X=x)$. But at any observation, we have noise, and we need to be able to estimate it between observations.

One (fairly naive) way we can approximate it is to assume that while the average changes, that it doesn't change too rapidly (i.e. that it's 'slowly varying' in a particular sense). Consequently, we could slice the domain up into sections (bins), and calculate an estimate of $E(Y)$ - here just the sample mean of the $y$'s - for all the $x$-values in each section (bin).

That is to say, the averages of the $y$'s in a narrow strip of $x$'s will typically be closer to those $x$'s than averaging more widely (because of variation in the mean of y over the range of x - that is, lower bias), but much smoother than just taking say the nearest x-value (because you're typically averaging a bunch of x's - all the ones in the bin).

The graph then is just a horizontal line drawn within the bin, at the mean of the observations in that bin. Those horizontal bars will 'follow' the data simply because they're local means - within each bin they're the average of the observations in the bin. Obviously as the relationship moves up and down the local bin-means will also.

This is fairly simple and can cope with complicated relationships, if they don't change too rapidly.

But it's not smooth! There are discontinuities at the bin boundaries.

There's a tradeoff between bias (in the sense that wider bins mean that we move further from accuracy in approximating the mean at a particular $x$ as we get near the ends of the bin) and variance (in the sense that narrower bins mean we have less data, so the noise dominates more).

A slightly more sophisticated version of this kind of general idea would be Nadaraya-Watson kernel smoothing, in the same way that kernel density estimation is related to the histogram.

http://en.wikipedia.org/wiki/Kernel_regression

and then we can work from there up to say local linear or local polynomial smoothing - fitting local lines or curves rather than local means.

Edit: Here's an outline of the basic steps in the example data you pointed to.

Here's the first few observations in the motorcycle data (which is in mcycle in the MASS package in R):

> head(mcycle,10)
   times accel
1    2.4   0.0
2    2.6  -1.3
3    3.2  -2.7
4    3.6   0.0
5    4.0  -2.7
6    6.2  -2.7
7    6.6  -2.7
8    6.8  -1.3
9    7.8  -2.7
10   8.2  -2.7

Härdle (who I said hello to just a few days ago when he visited) does a regressogram of this data in "Applied Nonparametric Regression". He says he uses a bin width of 4, and that his bin origin is 0. This is plainly false since he gets a lot more than 5 bins between 0 and 20. But lets take his suggestion of a bin width of 4 and on origin of 0. I'm going to run my bins to exclude the left limit and include the right one (simply because that's the way the R cut function does it; otherwise I'd be inclined to run the other way).

So that means our first bin runs from time 0 to time 4 ($(0,4]$). What are the accelerations in that?

bins of width 4

The first 5 times (circled in red) are in the first bin. The next 4 times (in blue) are in the second. We average the accelerations in those time periods.

averages for bins of width 4

So $\hat{m}(x)$ is -1.34 for $x$ in $(0,4]$ and then it's -2.35 for $x$ in $(4,8]$ and then it's -2.844 ...

In R:

with(mcycle,print(tapply(accel,cut(times,seq(0,60,4)),mean),3))
   (0,4]    (4,8]   (8,12]  (12,16]  (16,20]  (20,24]  (24,28]  (28,32]  (32,36] 
  -1.340   -2.350   -2.844  -22.365  -78.176 -119.167  -45.159   27.990   25.040 
 (36,40]  (40,44]  (44,48]  (48,52]  (52,56]  (56,60] 
   3.643    4.862   -4.020   -0.867   -2.350   10.700

So if we plot those values across those ranges:

regressogram of mcycle data

(Larger, clearer version)

Incidentally, Härdle's actual binwidth appears to be 2.1 - if I use that binwidth with bin origin 0, I reproduce his bin smooth:

My 2.1 binwidth regressogram

Best Answer

Related Solutions

Solved – Rules of Thumb to choose an initial number of class intervals and refine that choice (potentially automatically)

Solved – Rationale for the use of Regressogram (Bin-Smooth)

Related Question