Solved – Optimal bin width for two dimensional histogram

histogramoptimization

There are lots of rules for selecting an optimal bin width in a 1D histogram (see for example)

I'm looking for a rule that applies the selection of optimal equal-bin widths on two-dimensional histograms.

Is there such a rule? Perhaps one of the well-known rules for 1D histograms can be easily adapted, if so, could you give some minimal details on how to do so?

Best Answer

My advice would generally be that it's even more critical than in 1-D to smooth where possible i.e. to do something like kernel density estimation (or some other such method, like log-spline estimation), which tends to be substantially more efficient than using histograms. As whuber points out, it's quite possible to be fooled by the appearance of a histogram, especially with few bins and small to moderate sample sizes.

If you're trying to optimize mean integrated squared error (MISE), say, there are rules that apply in higher dimensions (the number of bins depends on the number of observations, the variance, the dimension, and the "shape"), for both kernel density estimation and histograms.

[Indeed many of the issues for one are also issues for the other, so some of the information in this wikipedia article will be relevant.]

This dependence on shape seems to imply that to choose optimally, you already need to know what you're plotting. However, if you're prepared to make some reasonable assumptions, you can use those (so for example, some people might say "approximately Gaussian"), or alternatively, you can use some form of "plug-in" estimator of the appropriate functional.

Wand, 1997$^{[1]}$ covers the 1-D case. If you're able to get that article, take a look as much of what's there is also relevant to the situation in higher dimensions (in so far as the kinds of analysis that are done). (It exists in working paper form on the internet if you don't have access to the journal.)

Analysis in higher dimensions is somewhat more complicated (in pretty much the same way it proceeds from 1-D to r-dimensions for kernel density estimation), but there's a term in the dimension that comes into the power of n.

Sec 3.4 Eqn 3.61 (p83) of Scott, 1992$^{[2]}$ gives the asymptotically optimal binwidth:

$h^∗=R(f_k)^{-1/2}\,\left(6\prod_{i=1}^dR(f_i)^{1/2}\right)^{1/(2+d)} n^{−1/(2+d)}$

where $R(f)=\int_{\mathfrak{R}^d} f(x)^2 dx$ is a roughness term (not the only one possible), and I believe $f_i$ is the derivative of $f$ with respect to the $i^\text{th}$ term in $x$.

So for 2D that suggests binwidths that shrink as $n^{−1/4}$.

In the case of independent normal variables, the approximate rule is $h_k^*\approx 3.5\sigma_k n^{−1/(2+d)}$, where $h_k$ is the binwidth in dimension $k$, the $*$ indicates the asymptotically optimal value, and $\sigma_k$ is the population standard deviation in dimension $k$.

For bivariate normal with correlation $\rho$, the binwidth is

$h_i^* = 3.504 \sigma_i(1-\rho^2)^{3/8}n^{-1/4}$

When the distribution is skewed, or heavy tailed, or multimodal, generally much smaller binwidths result; consequently the normal results would often be at best upper bounds on bindwith.

Of course, it's entirely possible you're not interested in mean integrated squared error, but in some other criterion.

[1]: Wand, M.P. (1997),
"Data-based choice of histogram bin width",
American Statistician 51, 59-64

[2]: Scott, D.W. (1992),
Multivariate Density Estimation: Theory, Practice, and Visualization,
John Wiley & Sons, Inc., Hoboken, NJ, USA.

Related Question