Robust Weighting – Calculating Robust Distance-Weighted Mean in Algorithms

algorithmsbias-variance tradeoffestimatorsrobustweighted mean

Given a data sample $\{x_i\}_1^n$, instead of hard omitting outliers by e.g. trimming, one can form a weighted average where we soft penalize observations out in the tails.

\begin{align}
\mu = \frac{\sum_{i=1}^n w_ix_i}{\sum_{i=1}^n w_i}
\end{align}

Now the question is the choice of $w_i$.

One method is the distance-weighted mean, where observations that have large distances to other observations get penalized more, according to
\begin{align}
w_i = \frac{1}{\sum_{k=1}^n |x_i-x_k|^p},
\end{align}

where $p=1$ for euclidean distance, and $p=2$ for squared distance. Obviously, using $p=2$ penalizes non-central data points more.

Let us now "invent" the implicit distance-weighted mean, where observations that have a large distance to the to be estimated mean get penalized more, according to
\begin{align}
w_i = \frac{1}{|x_i-\mu|^p}.
\end{align}

Since we are seeking $\mu$, but as it also is included in the weights, we can find it by ways of optimization.

Question: Ignoring issues with the implicit method like convergence and numerical issues, what are some different properties of these two methods, and when do one prefer one over the other (especially for small sample size)? I have not found any info on the implicit method, and I wonder if it even makes sense.

Best Answer

The so-called "implicit distance-weighted mean", not to be confused with IDW used for interpolation, is not a feasible way of estimating any form of robust mean. The reason is the following

  • It does not have a unique solution. In fact, any one of the original data points will always serve as a solution. This is due to the reciprocal of a zero distance causing the weight to explode. This is regardless of $p$,

In addition,

  • In the special case when $p=1$ and the number of data points $n$ is even, any point $\in\mathbb{R}$ in between the two data points neighbouring the median will also belong to the set of solutions.
  • When $p>1$, there will be one solution between each neigbouring data points, so an extra $n-1$ solutions, for a total of $2n-1$ solutions.
Related Question