Given a data sample $\{x_i\}_1^n$, instead of hard omitting outliers by e.g. trimming, one can form a weighted average where we soft penalize observations out in the tails.

\begin{align}

\mu = \frac{\sum_{i=1}^n w_ix_i}{\sum_{i=1}^n w_i}

\end{align}

Now the question is the choice of $w_i$.

One method is the **distance-weighted mean**, where observations that have large distances to other observations get penalized more, according to

\begin{align}

w_i = \frac{1}{\sum_{k=1}^n |x_i-x_k|^p},

\end{align}

where $p=1$ for euclidean distance, and $p=2$ for squared distance. Obviously, using $p=2$ penalizes non-central data points more.

Let us now "invent" the **implicit distance-weighted mean**, where observations that have a large distance to the to be estimated mean get penalized more, according to

\begin{align}

w_i = \frac{1}{|x_i-\mu|^p}.

\end{align}

Since we are seeking $\mu$, but as it also is included in the weights, we can find it by ways of optimization.

**Question:** Ignoring issues with the implicit method like convergence and numerical issues, what are some different properties of these two methods, and when do one prefer one over the other (especially for small sample size)? I have not found any info on the implicit method, and I wonder if it even makes sense.

## Best Answer

The so-called "

implicit distance-weighted mean", not to be confused with IDW used for interpolation, is not a feasible way of estimating any form of robust mean. The reason is the followingIn addition,