Expected Value – How to Calculate the Expectation of the KDE Using Little-o

asymptoticsexpected valuekernel-smoothingnonparametricself-study

This is possibly a duplicate of this question of mine, however, here I ask for clarification regarding an estimation that is done when calculating the expectation of the kernel density estimator (KDE) using little-o. The conditions on the KDE stated here are inspired by an exercise in an undergraduate textbook.

Background

Suppose $x_1, …, x_n$ are independent and identically distributed observations of a random variable $X$ with unknown distribution function $F$ and probability density function $f\in C^m$, for some $m>1$ fixed. Let $k\in C^{m+1}$ be a given fixed function such that
\begin{align}
k&\geq 0, \\
\mathrm{supp} (k)&=[-1,1], \\
\int_{\mathbb{R}} k(u)\mathrm{d}u&=1, \\
\int_{\mathbb{R}} k(u)u^l\mathrm{d}u&=0 \ \text{for all} \ 1\leq l<m \ \text{and}\\
\int_{\mathbb{R}} k(u)u^m\mathrm{d}u&<\infty .
\end{align}
Define the KDE $f_n$ of $f$ by $$f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right),$$
where $h=h(n)$ is the bandwidth. What is the expectation of $f_n$, i.e. $\mathbb{E}[f_n(t)]$?.

By linearity of the expectation, identical distribution of $x_1,…,x_n$, the law of the unconscious statistician and the change of variables $u=(t-x)/h$,
\begin{align}
\mathbb{E}[f_n(t)]&=\frac{1}{n}\sum_{i=1}^n \mathbb{E}\left[\frac{1}{h}k\left(\frac{t-x_i}{h}\right)\right]\\
&=\mathbb{E}\left[\frac{1}{h}k\left(\frac{t-x}{h}\right)\right]\\
&=\int_{\mathbb{R}}\frac{1}{h}k\left(\frac{t-x}{h}\right)f(x)\mathrm{d}x\\
&=\int_{\mathbb{R}}\frac{1}{h}k(u)f(t-hu)h\mathrm{d}u\\
&=\int_{\mathbb{R}}k(u)f(t-hu)\mathrm{d}u. \tag{1}
\end{align}
From $f\in C^m$, it follows that $$f(t-hu)=\sum_{l=0}^m \frac{f^{(l)}(t)}{l!} (-hu)^l+o((hu)^m). \tag{2}$$
$o(g(y))$ is a set (or a function) such that $f(y)\in o(g(y))$ (or $f(y)=o(g(y))$) satisfies $\lim_{y\to y_0} f(y)/g(y)=0$ for $y_0$ denoting a real number, a complex number or $\pm \infty$. In $(2)$, $y_0=0$. From $(1)$, $(2)$ and linearity of integration,
\begin{align}
\mathbb{E}[f_n(t)]&=\int_{\mathbb{R}}k(u)\left(\sum_{l=0}^m \frac{f^{(l)}(t)}{l!} (-hu)^l+o((hu)^m)\right)\mathrm{d}u \\
&=\sum_{l=0}^m\int_{\mathbb{R}}k(u)\frac{f^{(l)}(t)(-hu)^l}{l!}\mathrm{d}u+\int_{\mathbb{R}}k(u)o((hu)^m)\mathrm{d}u. \tag{3}
\end{align}
From the given conditions on $k$, the $l=0$ term reads
$$\int_{\mathbb{R}} k(u)f(t)\mathrm{d}u=f(t)\int_{\mathbb{R}} k(u) \mathrm{d}u=f(t).$$
The $1\leq l<m$ terms are
$$\int_{\mathbb{R}} k(u)\frac{f^{(l)}(t)}{l!} (-hu)^l\mathrm{d}u=\frac{f^{(l)}(t)(-h)^l}{l!}\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0.$$
Finally, the $l=m$ term is $$ \frac{f^{(m)}(t)(-h)^m}{m!}\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty.$$
From the definition of $o(g(y))$ given above, $o((hu)^m)$ denotes a function (the remainder) of $h$ and $u$ that for small $hu$, i.e. $hu\to 0$, approaches $0$ faster than $(hu)^m$. The remainder appears under the integral sign of an improper integral, which is the limit of a definite integral. For finite $u$, $hu\to 0$ means $h\to 0$. Thus the remainder is not only in $o((hu)^m)$ but also non-uniformly in $o(h^m)$, that is, for the remainder it holds that $o((hu)^m)=u^mo(h^m)=o(h^m)$.

Question

In the answer to the above linked question it is claimed, with slight modification, that
\begin{equation}
\int_\mathbb{R} k(u) o((hu)^m)\mathrm{d}u = \int_\mathbb{R} k(u) o(h^m)\mathrm{d}u =o(h^m)\int_\mathbb{R} k(u) u^m\mathrm{d}u =o(h^m) \tag{4},
\end{equation}
but if the remainder is non-uniformly in $o(h^m)$, then the last two equalities in $(4)$ may not hold. The following example shows how a similar reasoning may fail.

For each positive $a$ and $x$ near $0$,
\begin{equation}
g(x,a)=\frac{x^2}{a^2+x^2}\in o\!\left(x^{3/2}\right). %\ \text{for} \ x \ \text{near} \ 0.
\end{equation}
Define
\begin{equation}
f(x)=\int_0^1g(x,a)\,\mathrm{d}a.
\end{equation}
Is $f(x)\in o\!\left(x^{3/2}\right)$? It is tempting to reason as in $(4)$;
\begin{equation}
\int_0^1g(x,a)\,\mathrm{d}a=\int_0^1o\!\left(x^{3/2}\right)\mathrm{d}a=o\!\left(x^{3/2}\right).
\end{equation}
However, $\lim_{x\to0}f(x)/x=\pi/2$, which means that $f(x)\not\in o\!\left(x\right)\supseteq o\!\left(x^{3/2}\right)$.

So, for some $g(h,u)\in o(1)$¹,
$$\int_\mathbb{R} k(u) o((hu)^m)\mathrm{d}u= h^m \int_\mathbb{R} k(u) u^m g(h,u)\mathrm{d}u,$$
but without knowing how $g(h,u)$ behaves away from zero, it seems like no further estimates can be done. Is it possible to calculate the expectation of the KDE using little-o?

Footnotes:

The notation $g(h,u)$ implies the notation $g(hu)$. Unlike $g(hu)$, $g(h,u)$ includes those functions in $o(1)$ where $h$ and $u$ not only appear as $hu$.

Best Answer

Here is a suggested solution:

The integration occurs over $[-1,1]$ due to $\mathrm{supp}(k)=[-1,1]$ and $f$ being a probability density function, i.e. it integrates to $1$ over $\mathbb{R}$ and is thus bounded.
Instead of

From the definition of $o(g(y))$ given above, $o((hu)^m)$ denotes a function (the remainder) of $h$ and $u$ that for small $hu$, i.e. $hu\to 0$, approaches $0$ faster than $(hu)^m$. The remainder appears under the integral sign of an improper integral, which is the limit of a definite integral. For finite $u$, $hu\to 0$ means $h\to 0$. Thus the remainder is not only in $o((hu)^m)$ but also non-uniformly in $o(h^m)$, that is, for the remainder it holds that $o((hu)^m)=u^mo(h^m)=o(h^m)$.

write,

$o((hu)^m)$

$0$

$(hu)^m$

$hu\to 0$

$h$

$u$

$hu\to 0$

$h\to 0$

$o((hu)^m)=u^mo(h^m)=o(h^m)$

$h$

$u$

$0$

\begin{equation} \int_{[-1,1]} k(u) o((hu)^m)\mathrm{d}u =\int_{[-1,1]} k(u) o(h^m) \mathrm{d}u=o(h^m). \end{equation}

Footnotes:

If a sequence of functions $g_n$ converges uniformly to a function $g$ over some compact interval $I$ where $g_n$ and $g$ are integrable, then \begin{equation}\label{uniform} \lim_{n\to\infty}\int_I g_n(u)\mathrm{d}u=\int_I g(u)\mathrm{d}u. \end{equation}

Related Solutions

Gaussian Random Variable – Mean Conditional on Sum $E(X^2| k \geq|X+Y|)$

Let's simplify a little. Define

$$(U,V) = \frac{1}{\sqrt{\sigma_X^2+\sigma_Y^2}}\left(X+Y,\ \frac{\sigma_Y}{\sigma_X}X - \frac{\sigma_X}{\sigma_Y}Y\right).$$

You can readily check that $U$ and $V$ are uncorrelated standard Normal variables (whence they are independent). In terms of them,

$$X = \frac{\sigma_X}{\sqrt{\sigma_X^2 + \sigma_Y^2}} \left(\sigma_X U + \sigma_Y V\right) = \alpha U + \beta V$$

defines the coefficients of $X$ in terms of $(U,V).$ The question desires a formula for

$$E\left[X^2 \mid |X+Y|\ge k\right] = E\left[\left(\alpha U + \beta V\right)^2 \mid |U| \ge \lambda\right]$$

with $\lambda = k\sqrt{\sigma_X^2 + \sigma_Y^2} \ge 0.$

Expanding the square, we find

$$\begin{aligned} E\left[\left(\alpha U + \beta V\right)^2 \mid |U| \ge \lambda\right] &= \alpha^2E\left[U^2 \mid |U| \ge \lambda\right] \\&+ 2\alpha\beta E\left[UV \mid |U| \ge \lambda\right] \\&+ \beta^2 E\left[V^2 \mid |U| \ge \lambda\right]. \end{aligned}$$

The second term is zero because $E[V]=0$ and $V$ is independent of $U$. The third term is $\beta^2$ because the independence of $V$ and $U$ gives

$$E\left[V^2\mid |U|\ge \lambda\right] = E\left[V^2\right] = 1.$$

This leaves us to compute the first conditional expectation. The standard (elementary) formula expresses it as the fraction

$$E\left[U^2 \mid |U|\ge \lambda\right] = \frac{\left(2\pi\right)^{-1/2}\int_{|u|\ge \lambda} u^2 e^{-u^2/2}\,\mathrm{d}u}{\left(2\pi\right)^{-1/2}\int_{|u|\ge \lambda} e^{-u^2/2}\,\mathrm{d}u}$$

The denominator is $\Pr(|U|\ge \lambda) = 2\Phi(-\lambda)$ where $\Phi$ is the standard Normal distribution function.To compute the numerator, substitute $x = u^2/2$ to obtain

$$\frac{1}{\sqrt{2\pi}}\int_{|u|\ge \lambda}u^2 e^{-u^2/2}\,\mathrm{d}u = \frac{2^{3/2}}{\sqrt{2\pi}}\int_{\lambda^2/2}^\infty x^{3/2\,-1}\ e^{-x}\,\mathrm{d}x = \frac{1}{\Gamma(3/2)}\int_{\lambda^2/2}^\infty x^{3/2\,-1}\ e^{-x}\,\mathrm{d}x.$$

This equals $\Pr(Z\ge \lambda^2/2)$ where $Z$ has a Gamma$(3/2)$ distribution. It is a regularized incomplete gamma function, $P(3/2, \lambda^2/2).$ Consequently, with $\lambda \ge 0,$

$$E\left[\left(\alpha U + \beta V\right)^2 \mid |U| \ge \lambda\right] =\beta^2 + \frac{\alpha^2 P(3/2, \lambda^2/2)}{2 \Phi(-\lambda)}.$$

To illustrate, this R implementation of the conditional expectation (with a representing $\alpha,$ b representing $\beta,$ and $k$ representing $\lambda$) uses pnorm for $\Phi$ and pgamma for the Gamma distribution:

f <- function(a, b, k) { 
  b^2 + a^2 * pgamma(k^2/2, 3/2, lower.tail = FALSE) / (2 * pnorm(-k))
}

Background

Question

Best Answer

Related Solutions

Gaussian Random Variable – Mean Conditional on Sum $E(X^2| k \geq|X+Y|)$

Related Question