Solved – How to efficiently compute Theil-Sen estimator

regressionrobust

The Theil-Sen estimator is of interest to me, however when I implement it myself I end up with something that scales as O(n^2). According to wikipedia, it can be calculated exactly in O(n log(n)). Can someone point me toward an efficient implementation (python or mathematica would be best, Matlab or R would be tolerable) or otherwise explain in simple terms how the efficient versions work?

Best Answer

According to wikipedia, it can be calculated exactly in O(n log(n)).

Wikipedia points to no less than six papers detailing different deterministic or randomized algorithms with $O(n\log n)$ performance, right in the section where they mention the existence of such algorithms (as well as mentioning an even faster one under particular circumstances).

Deterministic:

Cole, Richard; Salowe, Jeffrey S.; Steiger, W. L.; Szemerédi, Endre (1989), An optimal-time algorithm for slope selection, SIAM Journal on Computing 18 (4): 792–810, doi:10.1137/0218055, MR 1004799.

Katz, Matthew J.; Sharir, Micha (1993), Optimal slope selection via expanders, Information Processing Letters 47 (3): 115–122, doi:10.1016/0020-0190(93)90234-Z, MR 1237287.

Brönnimann, Hervé; Chazelle, Bernard (1998), Optimal slope selection via cuttings, Computational Geometry Theory and Applications 10 (1): 23–29, doi:10.1016/S0925-7721(97)00025-4, MR 1614381.

$\ $

Randomized:

Dillencourt, Michael B.; Mount, David M.; Netanyahu, Nathan S. (1992), A randomized algorithm for slope selection, International Journal of Computational Geometry & Applications 2 (1): 1–27, doi:10.1142/S0218195992000020, MR 1159839.

Matoušek, Jiří (1991), Randomized optimal algorithm for slope selection, Information Processing Letters 39 (4): 183–187, doi:10.1016/0020-0190(91)90177-J, MR 1130747.

Blunck, Henrik; Vahrenhold, Jan (2006), "In-place randomized slope selection", International Symposium on Algorithms and Complexity, Lecture Notes in Computer Science 3998, Berlin: Springer-Verlag, pp. 30–41, doi:10.1007/11758471_6, MR 2263136.

Which did you want?