What Percent of the Time will a Given Random Walk will Behave a Certain Way

probabilityrandom walk

Suppose I have a 1 Dimensional Random Walk (on the integer line) with the following properties (initial conditions):

  • At time=0 it starts at position=0
  • At each time point, there is a 0.5 probability of moving +1 or -1

This Random Walk keeps going until one of the following things happens:

  • The Random Walk reaches position=10 (successful termination condition)

  • 100,000 Steps are taken and position=10 is still not reached (unsuccessful termination condition)

I wrote the following R simulation to represent random simulations of this Random Walk with these conditions:

library(ggplot2)

num_simulations <- 1000
max_steps <- 100000
target <- 10
steps <- numeric(num_simulations)
trajectories <- list()
terminated <- 0


for (i in 1:num_simulations) {
    position <- 0
    print(i)
    trajectory <- c(0)
    for (step in 1:max_steps) {
        position <- position + sample(c(-1, 1), 1)
        trajectory <- c(trajectory, position)
        if (position == target) {
            steps[i] <- step
            break
        }
        if (step == max_steps) {
            terminated <- terminated + 1
        }
    }
    trajectories[[i]] <- trajectory
}


termination_percentage <- (terminated / num_simulations) * 100


df1 <- data.frame(simulation_index = 1:num_simulations, log_steps = log(steps))
df2 <- data.frame(simulation_index = rep(1:num_simulations, sapply(trajectories, length)),
                  step = unlist(lapply(trajectories, seq_along)) - 1,
                  position = unlist(trajectories))


p1 <- ggplot(df1, aes(x = simulation_index, y = log_steps)) +
    geom_point() +
    geom_line() +
    labs(x = "Simulation Index", y = "Log(Number of Steps)",
         title = paste("Random Walk Simulations (Terminated:", termination_percentage, "%)")) +
    theme_bw()

p2 <- ggplot(df2, aes(x = step, y = position, group = simulation_index)) +
    geom_line(alpha = 0.1) +
    labs(x = "Step", y = "Position",
         title = paste("Trajectories of Random Walk Simulations (Terminated:", termination_percentage, "%)")) +
    theme_bw()

As we can see, roughly 2% of all simulations (with these initial conditions) met the unsuccessful termination condition.

enter image description here

My Question: Given a set of initial conditions and a set of termination conditions, is it possible to derive an expression for the expected number (i.e. expected value) of simulations expected to meet the unsuccessful termination condition?

Thanks!

Best Answer

EDIT: I've added the closed form expression of the probability

The probablity of unsuccessful walk after $n$ steps is given as follows

$$ P(n) = \sum_{k = 0}^{n} 2^{-n}p(k,n)$$ where $$p(k,n)=\begin{cases} \displaystyle\binom{n}{k} & k < x_c-x_0\\ \displaystyle\binom{n}{k} - \binom{n}{k+x_0-x_c}&x_c - x_0\leq k < \dfrac{n + x_c -x_0}{2}\\ 0 &k \geq \dfrac{n + x_c - x_0}{2} \end{cases}$$

or compactly $$ P(n) = \sum_{k = 0}^{\lfloor\frac{1}{2}(n+x_c-x_0)\rfloor} 2^{-n}\binom{n}{k} - \sum_{k = x_c-x_0}^{\lfloor\frac{1}{2}(n+x_c-x_0)\rfloor} 2^{-n}\binom{n}{k+x_0-x_c}$$

The function $p(k,n)$ represents the number of paths that reach position $x = x_0-n+2k$ without reaching the termination target $x_c$. Hence $P(n)$ is the sum of $p(k,n)$ for all $x$, with the weight $2^{-n}$ for each path. In order to derive the formula of $p(n)$ we need to prove the following relation $$\text{TNOW to reach } x\text{ from } x_0 \text{ without reaching }x_c\\= \\ \text{TNOW to reach } x\text{ from } x_0 - \text{TNOW to reach }x \text{ from } 2x_c - x_0$$ TNOW stands for "the number of ways" or the number of random walks. Let $X(i)$ be the position at step $i$ of a random walk that crosses or touches the boundary $x_c$. Let's define the reflection $R$ that maps the random walk $X$ to another random walk $Y_X$ by the following formula

$$Y_X(i) = \begin{cases} 2x_c - X(i) &i < L\\ X(i) &i \geq L \end{cases}$$

where $L$ is the last step that at which the randow walk reached the position $x_c$. The figure below illustrates how the function is defined

enter image description here

This function maps random walk that starts from point $x_0$ and ends at $x$ to a random walk that starts at point $2x_c-x_0$ and also ends at $x$, given they reach the boundary at least once, and vice versa. Furthermore, this reflection is an involution i.e. $R(R(X)) = X$, hence it is bijection between these set of random walks. And therefore their numbers are equal. Since the number of RW that reach $x$ and don't reach $x_c$ = the number of RW that reach $x$ - the number of RW that reach x and reach $x_c$ at some step(s), this gives us the relation aforementioned.

This reflection is the technique that is used to prove the reflection principle

The last thing we need to do is calculate the number of RW that starts from $x_0$ and $2x_c - x_0$, which are given by binomial distribution. This gives the formula of $p(k,n)$ Q.E.D.

Here's the old answer with approximation

The amount of time for a random process to reach a certain point for the first time is known as the first hitting time. For a Brownian motion, the survival probability of particle remains at the posittion $x < x_c$ at time $t$ is given by $$ \DeclareMathOperator{\erf}{erf} S(t) =\erf\left(\frac{x_c-x_0}{2\sqrt{Dt}}\right)$$ where $D$ is the diffusion constant. The expected number of trials until the first hitting time is larger than $t$ is $\dfrac{1}{S(t)}$. A 1D random walk with time step $\delta$ and step size $\epsilon$ is approximately a 1D Brownian motion with $D=\dfrac{\epsilon^2}{2\delta}$, thus $$\DeclareMathOperator{\erf}{erf} S_{\mathrm{RW}}(t)\approx \erf\left(\frac{x_c-x_0}{2\sqrt{\frac{\epsilon^2t}{2\delta}}}\right)$$ Given $\delta = 1, \epsilon = 1, x_c = 100, x_0 = 0$ we have $S(100000) \approx \mathrm{erf}(0.3863) \approx 0.2481\%$ and the expected number of trials is approximately $4.03$

The data given in the program, however, correspond to $x_c = 10$ (targer <- 10) and so $S(100000) \approx \mathrm{erf}(0.03863) \approx 2.52\%$ . $10$ is quite low and the lower $x_c$ leads to higher error in the approximation.

Edit: I've also used Python and in R to test it out (not very efficient though)

import random
t = []
target = 100
trials = 1000
for i in range(trials):
    x = 0
    for j in range(100000):
        x += -1 + random.randint(0,1)*2
        if x == target:
            t.append(j)
            break   
    print(f"Trial {i+1} finished after {j+1} steps")

print(1 - len(t)/trials)

R:

num_simulations <- 1000
max_steps <- 100000
target <- 100
steps <- numeric(num_simulations)
trajectories <- list()
terminated <- 0


for (i in 1:num_simulations) {
  position <- 0
  print(i)
  for (step in 1:max_steps) {
    position <- position + sample(c(-1, 1), 1)
    if (position == target) {
      steps[i] <- step
      break
    }
  }
  if (step == max_steps) {
    terminated <- terminated + 1
  }
}


sprintf("The percentage of unsuccessful attempts is %f %%", (terminated / num_simulations) * 100)

I got $22.6\%$ and $24.4\%$ for $x_c = 100$ and $2.5\%$ and $3.2\%$ for $x_c = 10$. which are quite close to the estimation

Related Question