Solved – the difference between buckets and bins

terminology

When calculating a histogram we do data binning, or group a number of more or less continuous values into a smaller number of "bins". But in bucket sort we set up buckets and assign a bucket to each value of some collection, according to its value. Although for a different purpose, the methods seem similar to me.

In statistics, what is the difference between bin/binning, and bucket/bucketing? For example, if I have a set of $N$ observations sampling $y = f(x_1, x_2, x_3)$, and I want to estimate how some statistic varies over my domain, one way is to put my $N$ observations in bins/buckets according to $(x_1, x_2, x_3, y)$ and calculate my statistic within each bin/bucket. An example could be the mean and standard deviation of temperature as a function of latitude, longitude, and altitude.

In this example, is there a standard term for those entities I divide my data in? Are those more correctly called bins, or buckets? Or does it really not matter?

Best Answer

A very good question, and a question that I myself had because I have heard these called buckets, groups, groupings, categories, categorical variables, discrete variables, and bins as I have changed disciplines. In general, use the language that the end-users of your analysis are most comfortable using - in a sense, speak their language (or force them to use yours! ha). There is no wrong answer here, other than a countless number of statisticians that would say that you shouldn't be grouping your variables into bins/buckets without a very good reason (or ever!) as you are spending degrees of freedom, making arbitrary cutoffs to create your buckets/bins, and losing information that was provided by your once valuable, continuous variables.