Have a look at natural breaks optimization.
https://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization
The term "clustering" is mostly used for multidimensional data.
Stay away from k-means. It's popular, but usually not appropriate for 1d data. 1d data can be handled much more clever, as it obviously is ordered. K-means algorithms (Lloyd, MacQueen) do not use the orderedness, and will test various non-sensical (non-continuous) combinations.
But here is another quite easy method: do a kernel density estimation. Locate local minima, and use these for splitting your data.
Cluster analysis with a single variable makes perfect sense whenever there is
some dimension along which values can be arranged. This could be a
measurement scale, time or space.
Given ordered data on some measurement scale, there might be
interest in looking for relative breaks within a frequency distribution
(antimodes, in one terminology).
Note of caution: However, breaks defining bins that are, or that might seem, arbitrary are widely shunned in several areas of statistical science, and there is widespread and marked preference for binning with equal intervals, and very often for avoiding binning altogether when possible. This is partly a matter of taste, partly one of convention: practices have shifted as it becomes easier to store datasets in their entirety.
A time series could be divided into spells,
epochs, periods, whatever, ideally with relatively small differences within
subseries and relatively large differences between subseries. The same
problem arises for space whenever a single spatial dimension (horizontal or vertical) is to be subdivided. In geological and other sciences, this is often studied under the heading of zonation.
Note that any formal clustering should always be accompanied by appropriate
plotting of the data (for example, using a dot or quantile or line plot),
which indeed may make clear either that breaks are obvious (so that formal
clustering is merely decorative) or that convincing breaks do not exist (so
that formal clustering may be pointless).
Consider a toy example of values ordered by magnitude:
14 15 16 23 24 25 56 57 58
where it is evident that a three-group clustering
14 15 16 | 23 24 25 | 56 57 58
is sensible. Whether the ordering is on the values themselves, or in time or
in space, the data can always be laid out in one dimension, which gives
special structure to the problem. Thus, although more general clustering
methods can be used, that special structure ideally should be exploited. $k$
groups devised for $n$ values are defined by placing $k - 1$ markers (in the
example above, $k - 1 = 2$); there are $n - 1$ possible places to put them.
There are thus $n - 1 \choose k - 1$ possible clusterings. However, if $k$ is free
to vary, then the total number of possible clusterings is $2^{n - 1}$, as each
value can be in the same group as each neighbour, or not. For even modest $n$,
that is a large number.
The problem can be made precise (Fisher 1958; Hartigan 1975) by placing
markers to minimise, for a given number of groups, the
$$\text{sum over groups of variability around group centres}.$$
A sum of squared deviations from group means will spring to mind as the most
obvious possibility. Sum of absolute
deviations from group medians, and other measures, might well be entertained.
Hartigan (1975) showed how a dynamic programming approach makes such
computation straightforward and presented Fortran code. A Stata implementation (Cox 2007) is group1d
to be installed from SSC.
Cox, N.J. 2007. GROUP1D: Stata module for grouping or clustering in one dimension. http://ideas.repec.org/c/boc/bocode/s456844.html
Fisher, W.D. 1958. On grouping for maximum homogeneity. Journal, American
Statistical Association 53: 789-98.
Hartigan, J.A. 1975. Clustering algorithms. New York: John Wiley. Ch.6.
Postscript This approach seems to match the first part of the specific question. I have pitched it generally because I think the formulation is of some general interest (and because it was easy for me to recycle part of the documentation of Cox 2007). But if the specific goal is to compare an income distribution with a reference uniform distribution, I don't see that binning has any part to play at all. That is a standard problem in economics for which Lorenz curves and inequality measures are the starting points. In essence, you can compare quantile to quantile or percent point to percent point.
Best Answer
Please see this question and answer to a very similar question:
https://stackoverflow.com/a/39338999/1060350
Here, there was a requirement that nearby pairs of points should be formed and put into separate groups (avoid the term cluster when they aren't regular clusters). But this should even further improve your requirement.
A similar, surprising, heuristic for your aim of attempting to have 0 mean is this:
This is only a heuristic - it will not find the optimum solution - but probsbly the problem is NP or exponential, so you can't afford to search for the optimum.
Above greedy approach should find a decent solution, because it tries to identify the hardest points first, then chooses the best counterbalance each.
If you need better results, you can modify step 3 to not choose at random, but instead assign points to clusters as to minimize how much the cluster center deviates from 0. This could also be used in step 2: instead of minimizing $||x+y||_2$ you could minimize $||C+x+y||_2$ where C are all the objects already in the cluster. These tweaks make the code slightly more complicated, but require not much computational overhead.