Solved – Probability that uniformly random points in a rectangle have Euclidean distance less than a given threshold

distanceprobability

Assume we have $n$ points in a rectangular with bound $[0,a] \times [0,b]$, and these points are uniformly distributed in this plane. (I am not quite familiar with statistics, so I don't know the difference between uniformly pick a node in the area $[0,a] \times [0,b]$, or uniformly pick $x$-axis from $[0,a]$ and $y$-axis from $[0,b]$ independently).

Given a distance threshold $d$, I may want to know the probability that two points' Euclidean distance is less than $d$, or more precisely, how many pair of nodes' distance will be less than $d$?

Maybe the following description would be unambiguous.

Let me specify this problem. Given $n$ nodes and threshold $d$. These $n$ points are uniformly distributed in a rectangle $[0,a] \times [0,b]$. Denote a random variable $\xi$ as the number of pairs of points within distance $d$. Find $E[\xi]$.

Best Answer

We can solve this problem analytically using some geometric intuition and arguments. Unfortunately, the answer is quite long and a bit messy.

Basic setup

First, let's set out some notation. Assume we draw points uniformly at random from the rectangle $[0,a] \times [0,b]$. We assume without loss of generality that $0 < b < a$. Let $(X_1,Y_1)$ be the coordinates of the first point and $(X_2,Y_2)$ be the coordinates of the second point. Then, $X_1$, $X_2$, $Y_1$, and $Y_2$ are mutually independent with $X_i$ distributed uniformly on $[0,a]$ and $Y_i$ distributed uniformly on $[0,b]$.

Consider the Euclidean distance between the two points. This is $$ D = \sqrt{(X_1-X_2)^2 + (Y_1-Y_2)^2} =: \sqrt{ Z_1^2 + Z_2^2} \> , $$ where $Z_1 = |X_1-X_2|$ and $Z_2 = |Y_1-Y_2|$.

Triangular distributions

Since $X_1$ and $X_2$ are independent uniforms, then $X_1 - X_2$ has a triangular distribution, whence $Z_1 = |X_1 - X_2|$ has a distribution with density function $$ f_a(z_1) = \frac{2}{a^2}(a-z_1) ,\quad 0 < z_1 < a \> . $$ The corresponding distribution function is $F_a(z_1) = 1 - (1-z_1/a)^2$ for $0 \leq z_1 \leq a$. Similarly, $Z_2 = |Y_1 - Y_2|$ has density $f_b(z_2)$ and distribution function $F_b(z_2)$.

Note that since $Z_1$ is a function only of the two $X_i$ and $Z_2$ is a function only of the $Y_i$, then $Z_1$ and $Z_2$ are independent. So the distance between the points is the euclidean norm of two independent random variables (with different distributions).

The left panel of the figure shows the distribution of $X_1 - X_2$ and the right panel shows $Z_1 = |X_1 - X_2|$ where $a = 5$ in this example.

Triangular densities

Some geometric probability

So $Z_1$ and $Z_2$ are independent and are supported on $[0,a]$ and $[0,b]$ respectively. For fixed $d$, the distribution function of the euclidean distance is $$\renewcommand{\Pr}{\mathbb P}\newcommand{\rd}{\,\mathrm{d}} \Pr(D \leq d) = \iint_{\{z_1^2+z_2^2 \leq d^2\}} f_a(z_1) f_b(z_2) \rd z_1 \rd z_2 \> . $$

We can think of this geometrically as having a distribution on the rectangle $[0,a] \times [0,b]$ and considering a quarter circle of radius $d$. We'd like to know the probability that is inside the intersection of these two regions. There are three different possibilities to consider:

Region 1 (orange): $0 \leq d < b$. Here the quarter circle lies completely within the rectangle.

Region 2 (red): $b \leq d \leq a$. Here the quarter circle intersects the rectange along the top and bottom edges.

Region 3 (blue): $a < d \leq \sqrt{a^2 + b^2}$. The quarter circle intersects the rectangle along the top and right edges.

Here is a figure, where we draw an example radius of each of the three types. The rectangle is defined by $a = 5$, $b = 4$. The grayscale heatmap within the rectangle shows the density $f_a(z_1) f_b(z_2) \rd z_1 \rd z_2$ where dark areas have higher density and lighter areas have smaller density. Clicking on the figure will open a larger version of it.

Some ugly calculus

To calculate the probabilities, we need to do some calculus. Let's consider each of the regions in turn and we'll see that a common integral will arise. This integral has a closed-form, though it's not very pretty.

Region 1: $0 \leq d < b$.

$$\newcommand{\radius}{\sqrt{d^2 - y^2}} \Pr(D \leq d) = \int_0^d \int_0^{\radius} f_b(y) f_a(x) \rd x \rd y = \int_0^d f_b(y) \int_0^{\radius} f_a(x) \rd x \rd y \>. $$

Now, the inner integral yields $\frac{1}{a^2}\radius (2 a - \radius)$. So, we are left to compute an integral of the form $$ G(c) - G(0) = \int_0^c (b - y) \radius (2a - \radius) \rd y \> , $$ where in this case of interest $c = d$. The antiderivative of the integrand is $$ \begin{align*} G(y) &= \int (b - y) \radius (2a - \radius) \rd y \\ &= \frac{a}{3} \radius ( y (3 b - 2 y) + 2 d^2) \\ &\quad + \,a b d^2 \tan^{-1}\Big(\frac{y}{{\scriptstyle \radius}}\Big) - b d^2 y \\ &\quad + \,\frac{b y^3}{3} + \frac{(d y)^2}{2} - \frac{y^4}{4} \> . \end{align*} $$

From this we get that $\Pr(D \leq d) = \frac{2}{a^2 b^2} (G(d) - G(0))$.

Region 2: $b \leq d \leq a$.

$$ \Pr(D \leq d) = \frac{2}{a^2 b^2} (G(b) - G(0)) \>, $$ by the same reasoning as for Region 1, except now we must integrate along the $y$-axis all the way up to $b$ instead of just $d$.

Region 3: $a < d \leq \sqrt{a^2 + b^2}$. $$ \begin{align*} \Pr(D \leq d) &= \int_0^\sqrt{d^2-a^2} f_b(y)\rd y + \int_{\sqrt{d^2-a^2}}^b f_b(y) \int_{0}^\radius f_a(x) \rd x \rd y \\ &= F_b(\sqrt{d^2-a^2}) + \frac{2}{a^2 b^2} (G(b) - G(\sqrt{d^2-a^2})) \end{align*} $$

Below is a simulation of 20000 points where we plot the empirical distribution as grey points and the theoretical distribution as a line, colored according to the particular region that applies.

Empirical cdf and theoretical

From the same simulation, below we plot the first 100 pairs of points and draw lines between them. Each is colored according to the distance between the pair of points and which region this distance falls into.

Random sample of points

The expected number of pairs of points within distance $d$ is simply $$ \mathbb E[\xi] = {n \choose 2} \Pr(D \leq d) \>, $$ by linearity of expectation.

Related Solutions

Probability of Relations on Uniform Distribution Over 2D Space Explained

There are at least two interpretations: one concerns the actual points generated by this process and the other concerns the process itself.

If a realization of the Poisson process is given and pairs of points are to be chosen from that realization, then there is nothing to be done except systematically compare all distances to all other distances (a double loop over the points).

Otherwise, if the procedure is intended to consist of (i) creating a realization of the process and then (ii) selecting a pair of points at random, then the assumptions imply the two points are selected uniformly and independently from the circle. The calculation for this situation can be performed once and for all.

Notice that the squared distances $r_1 = d_1^2$ and $r_2 = d_2^2$ are uniformly distributed, whence the desired probability is

$$p(a,b) = \Pr\left(d_1^2 \lt \frac{d_2^2}{a(1 + b d_2^2)}\right) = \int_0^1 d r_2 \int_0^{\max(0, \min(1, r_2 / (a(1 + b r_2))))} d r_1.$$

The $\max$ and $\min$ can be handled by breaking into cases. Some special values of $a$ and $b$ have to be handled. Because the integration is a square window over a region generically bounded by lines and lobes of a hyperbola (with vertical axis at $1/(ab)$ and horizontal axis at $-1/b$), the result is straightforward but messy; it should involve rational expressions in $a$ and $b$ and some inverse hyperbolic functions (that is, natural logarithms). I had Mathematica write it out:

$$\begin{array}{ll} \frac{b+1}{b} & \left(-1\leq a<0\land \frac{1}{a}-b\leq 1\land b<-1\right)\\ &\lor \left(a<-1\land \frac{1}{a}-b<1\land b<-1\right) \\ -\frac{1}{b (a b-1)} & \frac{1}{a}-b=1\land a<-1 \\ \frac{a^2 b+2 a b+a-2}{2 (a b-1)} & b=0\land a>0\land \frac{1}{a}-b>1 \\ \frac{b-\log (b+1)}{a b^2} & a>0\land \frac{1}{a}-b\leq 1\land b>-1 \\ \frac{a b^2+a b-a b \log (b+1)-b+\log (b+1)}{a b^2 (a b-1)} & a>0\land \frac{1}{a}-b\leq 1\land b\leq -1 \\ \frac{\log (1-a b)}{a b^2} & a>0\land \frac{1}{a}-b>1\land b\leq -1 \\ \frac{a b^2+a b+\log (1-a b)}{a b^2} & \left(-1<b<0\land a>0\land \frac{1}{a}-b>1\right) \\ & \lor \left(b>0\land a>0\land \frac{1}{a}-b>1\right) \\ \frac{b-\log ((-b-1) (a b-1))}{a b^2} & a<0\land \frac{1}{a}-b>1 \end{array}$$

Numeric integration and simulation over the ranges $-2 \le a \le 2$ and $-5 \le b \le 5$ confirm these results.

Edit

The modified question asks to replace $d_i^2$ by $d_i^\alpha$ and assumes $a$ and $b$ both positive. Upon making a substitution $r_i = d_i^\alpha$, the region of integration remains the same and integrand becomes $(2/\alpha)^2(r_1 r_2)^{2/\alpha-1}$ instead of $1$. Writing $\theta = \alpha/2$, we obtain

$$\frac{1}{2} a^{-1/\theta } \, _2F_1\left(\frac{1}{\theta },\frac{2}{\theta };\frac{\theta +2}{\theta };-b\right)$$

when $(a>0\land a<1\land a b+a\geq 1)$ or $a\geq 1$ and otherwise the result is

$$-a^{\frac{1}{\theta }} \left(\frac{1}{1-a b}\right)^{\frac{1}{\theta }}+\frac{1}{2} a^{\frac{1}{\theta }} (1-a b)^{-2/\theta } \, _2F_1\left(\frac{1}{\theta },\frac{2}{\theta };\frac{\theta +2}{\theta };1+\frac{1}{a b-1}\right)+1.$$

Here, $_2F_1$ is the hypergeometric function. The original case of $\alpha=2$ corresponds to $\theta=1$ and then these formulae reduce to the fourth and seventh of the eight previous cases. I have checked this result with a simulation, letting $\theta$ range from $1$ through $3$ and covering substantial ranges of $a$ and $b$.

How to Find Expected Distance Between Two Uniformly Distributed Points?

##problem
x <- runif(1000000,0,30)
y <- runif(1000000,0,40)
Uniform <- as.data.frame(cbind(x,y))
n <- nrow(Uniform)
catch <- rep(NA,n)
for (i in 2:n) {
      catch[i] <-((x[i+1]-x[i])^2 + (y[i+1]-y[i])^2)^.5
}
mean(catch, na.rm=TRUE)
18.35855

If I understand correctly what you're looking for, maybe this helps. You're trying to figure out the distance between to random points, who's X values are generated from unif(0,30) and Y values are generated from a unif (0,40). I just created a million RV's from each of those to distributions and then bound the x and the y to create a point for each of them. Then I calculated the distance between point 2 and 1 all the way to the distance between points 1,000,000 and 999,999. The average distance was 18.35855. Let me know if this isn't what you were looking for.

Best Answer

Related Solutions

Probability of Relations on Uniform Distribution Over 2D Space Explained

Edit

How to Find Expected Distance Between Two Uniformly Distributed Points?

Related Question