If you can accept some inaccuracy, this problem can be solved easily by *binning* counts. That is, pick some largeish number $M$ (say $M = 1000$), then initialize some integer bins $B_{i,j}$ for $i = 1\ldots M$ and $j = 1\ldots N$, where $N$ is the vector size, as zero. Then when you see the $k$th observation of a percentual vector, increment $B_{i,j}$ if the $j$th element of this vector is between $(i-1)/M$ and $i/M$, looping over the $N$ elements of the vector. (I am assuming your input vectors are non-negative, so that when you compute your 'percentuals', the vectors are in the range $[0,1]$. )

At any point in time, you can estimate the mean vector from the bins, and the mean absolute deviation. After observing $K$ such vectors, the $j$th element of the mean is estimated by
$$\bar{X}_j = \frac{1}{K} \sum_i \frac{i - 1/2}{M} B_{i,j},$$ and the $j$th element of the mean absolute deviation is estimated by $$\frac{1}{K} \sum_i | \bar{X_j} - \frac{i - 1/2}{M} | B_{i,j}$$

**edit**: this is a specific case of a more general approach where you are building an empirical density estimate. This could be done with polynomials, splines, etc, but the binning approach is the easiest to describe and implement.

Only single-linkage is optimal. Complete-linkage fails, so your intuition was wrong there. ;-)

As a simple example, consider the one-dimensional data set 1,2,3,4,5,6.

The (unique) optimum solution for complete linkage with two clusters is (1,2,3) and (4,5,6) (complete linkage height 2). The (unique) optimum solution with three clusters is (1,2), (3,4), (5,6), at linkage height 1.
There exists no agglomerative hierarchy that contains both, obviously, because these partitions do not *nest*. Hence, no hierarchical clustering contains all optimal solutions for complete linkage. I would even expect complete linkage to exhibit among the worst gap between the optimum solution and the agglomerative solution at high levels, based on that example... An this example likely is a counter example for most other schemes (except single linkage, where this data set degenerates).

You may be interested in studying this article:

Gunnar E. Carlsson, Facundo MÃ©moli:
Characterization, **Stability and Convergence of Hierarchical Clustering Methods.** Journal of Machine Learning Research 11: 1425-1470 (2010)

and the references contained therein, in particular also:

Kleinberg, Jon M. "**An impossibility theorem for clustering**." Advances in neural information processing systems. 2003.

These seem to indicate that given some mild stability requirements (which can be interpreted as uncertainty of our distance measurements), only single-linkage clustering remains. Which, unfortunately, is all but usable in most cases... so IMHO for practical purposes the authors overshoot a bit, neglecting usefulness of the result over pretty theory.

## Best Answer

Could you group the data set into much smaller data sets (say 100 or 1000 or 10,000 data points) If you then calculated the median of each of the groups. If you did this with enough data sets you could plot something like the average of the results of each of the smaller sets and this woul, by running enough smaller data sets converge to an 'average' solution.