Median Estimation Algorithm – Best Algorithm for Huge Read-Once Data Sets

algorithmslarge datamedianonline-algorithms

I'm looking for a good algorithm (meaning minimal computation, minimal storage requirements) to estimate the median of a data set that is too large to store, such that each value can only be read once (unless you explicitly store that value). There are no bounds on the data that can be assumed.

Approximations are fine, as long as the accuracy is known.

Any pointers?

Best Answer

Could you group the data set into much smaller data sets (say 100 or 1000 or 10,000 data points) If you then calculated the median of each of the groups. If you did this with enough data sets you could plot something like the average of the results of each of the smaller sets and this woul, by running enough smaller data sets converge to an 'average' solution.

Related Question