[Math] Multiple outliers for two variable linear regression

ca.classical-analysis-and-odeslinear algebrast.statistics

Problem

Visually, the "extreme" outliers in the following graph are somewhat obvious:

https://i.imgur.com/tiSbS.png

Question

Given:

  • T – Set of all temperatures
  • Y – Set of all years
  • ΣT – Sum of temperatures.
  • ΣY – Sum of years.
  • N – Number of elements
  • T(n) – Temperature of the nth element in the temperature set

How do you determine if T(n) is an outlier?

Related Sites

The math on some of these sites is a bit above my understanding:

Many thanks!

Best Answer

I might suggest LTS, the Least Trimmed Squares, approach. there is code in fortran and matlab, the latter called fastlts, both produced, I believe, by Rousseuw's group. The method essentially minimizes the error of fit for a proportion of the data points, with the rest (outliers) ignored. The outliers are found by something like the Minimum Volume Ellipsoid method (roughly, find the ellipsoid of minimum volume containing 1/2 the points).

hth,

Related Question