We have data $X_1, \dots, X_n$ which are i.i.d copies of $X$. Where we denote $\mathbb{E}[X] = \mu$, and $X$ has finite variance.
We define the truncated sample mean:
$\begin{align}
\hat{\mu}^{\tau} := \frac{1}{n} \sum_{i =1}^n \psi_{\tau}(X_i)
\end{align}$
Where the truncation operator is defined as:
$\begin{align}
\psi_{\tau}(x) = (|x| \wedge \tau) \; \text{sign}(x), \quad x \in \mathbb{R}, \quad \tau > 0
\end{align}$
The bias for this truncated estimator is then defined as:
Bias $:= \mathbb{E}(\hat{\mu}^{\tau}) – \mu$
And I saw the inequality:
$\begin{align}
|\text{Bias}| = |\mathbb{E}[(X – \text{sign}(X)\tau) \mathbb{I}_{\{|X| > \tau\}}]| \leq \frac{\mathbb{E}[X^2]}{\tau}
\end{align}$
But I am not sure how this was derived.
Best Answer
First note that \begin{align*} & |E[(X - \operatorname{sign}(X)\tau)I(|X| > \tau)]| \\ =& |E[(X - \tau)I(X > \tau) + (X + \tau)I(X < -\tau)]| \\ \leq & E[|(X - \tau)I(X > \tau) + (X + \tau)I(X < -\tau)|]. \end{align*}
Now note the function $f(x) = |(x - \tau)I_{(\tau, \infty)}(x) + (x + \tau)I_{(-\infty, -\tau)}(x)|, x \in \mathbb{R}$ is dominated by the function $g(x) = x^2/\tau, x \in \mathbb{R}$ (draw a picture). The inequality then follows by taking "$E$" on both sides of the inequality $f(X) \leq g(X)$.
When $\tau = 2$, the graphs of $f(x)$ and $g(x)$ are shown as follows: