About a density property of the Nearest Neighbor algorithm: part 2.

geometric-measure-theorymartingalesmeasure-theoryprobabilityprobability theory

Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space and $(\mathcal{X},d)$ be metric space. Suppose that $X,X_1,X_2,X_3,… : \Omega\to\mathcal{X}$ are $\mathbb{P}$-i.i.d. random variables.

Get a closed set $K$ of $(\mathcal{X},d)$ and $x\in\partial K$.

Suppose that:
$$\exists \delta_x \in(0,1], \frac{\mathbb{P}(X\in \partial K\cap B_r (x))}{\mathbb{P}(X\in B_r (x))}\to \delta_x, r\to 0^+$$
and
$$\frac{\mathbb{P}(X\in K^c\cap B_r (x))}{{\mathbb{P}(X\in B_r (x))}}\to0, r\to 0^+$$
where $B_r(x)$ is the open ball centered in $x$ of radius $r$ in $(\mathcal{X},d)$.

Define: $$\forall m\in\mathbb{N}, \pi_m^x: \mathcal{X}^m\to\{1,…,m\}, (x_1,…,x_m)\mapsto \min\left(\operatorname{argmin}_{k\in\{1,…,m\}}\left\{d\left(x,x_k\right)\right\}\right).$$

Define:
$$\forall m\in\mathbb{N}, Z_m^x:\Omega\to\mathcal{X}, \omega\mapsto X_{\pi_m^x\left((X_1(\omega),…,X_m(\omega)\right)}(\omega).$$

Is it true that $\mathbb{P}(Z_m^x\in K^c)\to 0, m\to\infty?$

This is a version with stronger hypothesis of this other question that has a negative answer: About a density property of the Nearest Neighbor algorithm

Best Answer

Here's a critical obstruction to many methods of proof.

Let's say we're in $\mathbb{R}^2$, $x$ is the origin, and $\mu$ is supported on countably many concentric circles around the origin. The radii of these circles are $r_n = 10^{10^{-n}}$, and $\mu$ has total mass $2\pi r_n$ on each and is uniformly distributed (Lebesgue measure) on each. (I guess you have to normalize $\mu$ to be a probability measure, so just do that). Let $(K^c)^{(n)}$ be open an arc of proportion $\frac{1}{n}$ on the circle of radius $r_n$, and let $K^c = \cup_{n \ge 1} (K^c)^{(n)}$. Then for all intermediate values of $m$, i.e., those for which it is very probably that at least one point will be on the circle with radius $r_n$ but very improbably that at least one point will be on the circle with radius $r_{n+1}$, the reason that $Z^x_m$ will be in $K$ will not be because it is unlikely that some point will be in $K^c$, but rather, because the first point chosen on the circle with radius $r_n$ will, with probability going to $1$, not be in $K^c$.

My point is that you definitely have to use that $Z^x_m$ cares about the first closest point. It seems that none of your attempts utilize this fact. For example, with respect to edit 5 / the answer below, it will actually be true that $P(\cup_{j=1}^m X_j \in K^c \cap B_r(x))$ is large for any of the smart choices of $r$.

If we change definitions and say that "$Z^x_m$ is in $K$ if at least one of the closest points to $x$ is in $K$", then it will be false that $P(Z^x_m) \to 0$ as $m \to \infty$.

Actually, you have done all of the hard work (in edit 5). It seems that you are saying that once you have chosen $s_1 > s_2 > \dots$, it might not be the case that the intervals $(I_j)_j$ overlap enough. This could be true, but why choose $s_1 > s_2 > \dots$ arbitrarily to begin with? You can just consider all $r > 0$ at once. I provide a proof below (which will also make me make sure that your main argument in edit 5 is correct). Below the second fold, I will give some more comments about this problem, to make myself not feel bad about getting the bounty (if I didn't make any mistakes about anything).

I will use $\mu$ to denote the measure induced on $\mathcal{X}$ by $P$ (i.e. $\mu(E) = P(X \in E)$).

Theorem: Let $(\mathcal{X},d)$ be a metric space and $\mu$ be a probability distribution on $\mathcal{X}$. Take $x \in \mathcal{X}$ with $\mu(B_r(x)) > 0$ for each $r > 0$. Suppose that $\lim_{r \downarrow 0} \frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))} = 0$. Then, $P(Z^x_m \in K^c) \to 0$ as $m \to \infty$, where $Z^x_m$ is the nearest neighbor function relative to $x$ (with $m$ samples).

Proof: We can, of course, assume $\mu(K^c \cap B_r(x)) > 0$ for each $r > 0$. Take $\epsilon > 0$. Take $r_0 > 0$ so that $\frac{\mu(K^c \cap B_r(x))}{\mu(B_r(x))} < \epsilon$ for each $0 < r < r_0$. The map $(\frac{1}{3},\frac{2}{3})\times(0,r_0) \to (0,\infty)$, $(\alpha,r) \mapsto \frac{1}{\mu(B_r(x))}\frac{1}{[\frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))}]^\alpha}$ is continuous; its image has the form $(\eta,\infty)$, since $r$ going to $0$ makes the function blow up. Take any $m \in \mathbb{N}$ with $m \ge \eta$. Take $\alpha \in (\frac{1}{3},\frac{2}{3})$ and $0 < r < r_0$ with $m = \frac{1}{\mu(B_r(x))}\frac{1}{[\frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))}]^\alpha}$. Then, $$P(Z^x_m \in K^c \cap B_r(x)) \le P(\cup_{j=1}^m X_j \in K^c \cap B_r(x)) \le \sum_{j=1}^m P(X_j \in K^c \cap B_r(x))$$ $$= m\mu(K^c \cap B_r(x)) = (\frac{\mu(K^c \cap B_r(x))}{\mu(B_r(x))})^{1-\alpha} \le \epsilon^{1/3}.$$ And, $$P(Z^x_m \in B_r(x))^c = \mu(B_r(x)^c)^m = (1-\mu(B_r(x)))^m \le \exp(-m\mu(B_r(x)))$$ $$= \exp(-(\frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))})^{-\alpha}) \le \exp(-\epsilon^{-1/3}).$$ Therefore, $$P(Z^x_m \in K^c) \le P(Z^x_m \in K^c \cap B_r(x))+P(Z^x_m \in B_r(x))^c) \le \epsilon^{1/3}+\exp(-\epsilon^{-1/3}),$$ which can be made arbitrarily small by taking $\epsilon$ arbitrarily small. $\square$

Claim: The theorem above is false if we only insist that $\frac{\mu(K^c \cap B_r(x))}{\mu(B_r(x))}$ goes to $0$ along a subsequence of $r$.

Proof: Take $\mathcal{X} = \{z \in \mathbb{R}^2 : |z| < 1\}$, $d$ to be standard Euclidean metric, and $\mu$ to be Lebesgue measure (uniform distribution). Let $K^c = \cup_{k \ge 1} \{\frac{1}{2^{10^{2k+1}}} < |z| < \frac{1}{2^{10^{2k}}}\}$. Clearly, for $x$ the origin, $\frac{\mu(K^c \cap B_r(x))}{\mu(B_r(x))}$ goes to $0$ along $(r_k)_k = (2^{10^{2k+1}})_{k \ge 1}$. Finally, $$P(|Z^x_m| \in [\frac{1}{2^{10^{2k+1}}},\frac{1}{2^{10^{2k}}}]) \ge 1-P(|Z^x_m| < \frac{1}{2^{10^{2k+1}}})-P(|Z^x_m| > \frac{1}{2^{10^{2k}}})$$ $$\ge 1-m\frac{\pi}{(2^{10^{2k+1}})^2}-(1-\frac{\pi}{(2^{10^{2k}})^2})^m,$$ so we can take $m = 2^{3\cdot 10^{2k}}$ to see that $P(Z^x_m)$ is exponentially close to $1$. $\square$

Best Answer

Related Solutions

A nearest neighbor conditional probability problem

Simple question about Lebesgue density points of open sets

Related Question