[Math] How many random walk steps until the path self-intersects

discrete geometrypr.probabilityrandom walks

Take a random walk in the plane from the origin,
each step of unit length in a uniformly random direction.

Q. How many steps on average until the path self-intersects?

My simulations suggest ~$8.95$ steps.

^{Red: origin. Top: the $8$-th step self-intersects.

Bottom: the $11$-th step self-intersects.

(Not to same scale.)}

I suspect this is known in the SAW literature (SAW=Self-Avoiding Walk),
but I am not finding this explicit number.

Added. Here is a histogram of the number of steps to self-intersection.

^{$10000$ random trials.}

Best Answer

Here are some computational insights.

I ran $N = 10^k$ simulations for $3 \leq k \leq 12$. For $N = 10^9$ I recorded some explicit instances and recorded which pairs intersected, as opposed to just the lengths. For all other runs, I simply summed the results.

First and foremost here's a summary of the runs:

$$ \begin{array}{ |l|l|l|l| } \hline N & \text{mean} & \text{std/sqrt(}N\text{)} & \text{total number of segments} \\ \hline 10^3 & 8.732 & 0.16106 & 8732 \\ 10^4 & 8.781 & 0.0528827 & 87810 \\ 10^5 & 8.86218 & 0.0166654 & 886218 \\ 10^6 & 8.89447 & 0.00524551 & 8894468 \\ 10^7 & 8.88463 & 0.00165934 & 88846265 \\ 10^8 & 8.88595 & 0.000524796 & 888595095 \\ 10^9 & 8.88615 & 0.000165965 & 8886153545 \\ 10^{10} & 8.88631 & - & 88863113226 \\ 10^{11} & 8.88614 & - & 888613831649 \\ 10^{12} & 8.88614 & - & 8886139852418 \\ \hline \end{array} $$

We can apply the central limit theorem to this (sub)sequence of means. Even though $\text{std/sqrt(}N\text{)}$ was not calculated for $N = 10^{12}$, based on prior terms ~$0.00000524$ seems like a sufficient estimate.

From here I think we can be fairly confident that the first few digits of the mean are $\bf{8.8861}$. Here are the first few standard deviations ($68-95-99.7$ rule):

$$ m \pm 1\sigma \rightarrow [8.886135, 8.886145] \\ m \pm 2\sigma \rightarrow [8.886129, 8.886150] \\ m \pm 3\sigma \rightarrow [8.886124, 8.886156] $$

In fact it takes $8$ standard deviations to lose the digit $1$:

$$ m \pm 8\sigma \rightarrow [8.886098, 8.886181] $$

Finally, based on an inverse symbolic search, we can find that ${\bf\frac{1795}{202}} = 8.886138613...$ falls within $0.25\sigma$ of $m = 8.886139852418$ and so maybe this has potential to be the closed form.

For $N = 10^9$, here is the histogram of the number of steps to self-intersection, whose shape is quite similar to the one posted by OP:

If we plot look at the log plot, we can see the tail follows a decaying exponential quite closely (in red):

From here we might think we could use a gamma distribution or something similar to estimate the data, but I was only able to find a tight approximation with one distribution: the inverse Gaussian distribution $\operatorname{IG}(8.886, 26)$. Here's its PDF overlaid on the histogram:

I also recorded which segments intersected each other for $N = 10^9$. The data shows segment $n$ most often intersects with segment $n-2$, followed by $n-3$, etc which I think makes sense. Borrowing terminology from MattF's answer, the quick intersections are the most probable.

In the following diagram an $(x, y)$ pair indicates how often segment $x$ intersected with segment $y$. Each diagonal approximately decays exponentially, all with the same rate.

For each of the 4 cores I ran the simulations on, I recorded a single sequence of each length. The most interesting ones of course are the long ones.

Here's an instance of length 96. It's riddled with near misses and is ultimately squashed by a quick intersection.

Here the suspicious looking part is indeed intersection free -- it's about $0.01 \,\text{rad}$:

Lastly I looked at the histogram of angles in my saved simulations:

I think this bias (introduced by stopping once an intersection occurs) says that the best way to stay intersection free is to not curl back up towards yourself.

I have compiled all of my data into a Mathematica notebook which can be downloaded here.

Best Answer

Related Solutions

[Math] Hitting times for an N-dimensional random walk on a lattice with (strictly positive) random integer steps

Related Question