Edit: I've filled in a few more details.
The Hoeffding bound from expressing $Y-X$ as the sum of $n$ differences between Bernoulli random variables $B_q(i)-B_p(i)$ is
$$Prob(Y-X \ge 0) = Prob(Y-X + n(p-q) \ge n(p-q)) \le \exp\bigg(-\frac{2n^2 (p-q)^2}{4n}\bigg)$$
$$Prob(Y-X \ge 0) \le \exp\bigg(-\frac{(p-q)^2}{2}n\bigg)$$
I see three reasons you might be unhappy with this.
- The Hoeffding bound just isn't sharp. It's based on a Markov bound, and that is generally far from sharp.
- The Hoeffding bound is even worse than usual on this type of random variable.
- The amount by which the Hoeffding bound is not sharp is worse when $p$ and $q$ are close to $0$ or $1$ than when they are close to $\frac12$. The bound depends on $p-q$ but not how extreme $p$ is.
I think you might address some of these by going back to the proof of Hoeffding's estimate, or the Bernstein inequalities, to get another estimate which fits this family of variables better.
For example, if $p=0.6$, $q=0.5$, or $p=0.9$, $q=0.8$, and you want to know when the probability is at most $10^{-6}$, the Hoeffding inequality tells you this is achieved with $n\ge 2764$.
For comparison, the actual minimal values of $n$ required are $1123$ and $569$, respectively, by brute force summation.
One version of the Berry-Esseen theorem is that the Gaussian approximation to a cumulative distribution function is off by at most
$$0.71 \frac {\rho}{\sigma^3 \sqrt n}$$
where $\rho/\sigma^3$ is an easily computed function of the distribution which is not far from 1 for the distributions of interest. This only drops as $n^{-1/2}$ which is unacceptably slow for the purpose of getting a sharp estimate on the tail. At $n=2764$, the error estimate from Berry-Esseen would be about $0.02$. While you get effective estimates for the rate of convergence, those estimates are not sharp near the tails, so the Berry-Esseen theorem gives you far worse estimates than the Hoeffding inequality.
Instead of trying to fix Hoeffding's bound, another alternative would be to express $Y-X$ as a sum of a (binomial) random number of $\pm1$s by looking at the nonzero terms of $\sum (B_q(i)-B_p(i))$. You don't need a great lower bound on the number of nonzero terms, and then you can use a sharper estimate on the tail of a binomial distribution.
The probability that $B_q(i)-B_p(i) \ne 0$ is $p(1-q) + q(1-p) = t$. For simplicity, let's assume for the moment that there are $nt$ nonzero terms and that this is odd. The conditional probability that
$B_q(i)-B_p(i) = +1$ is $w=\frac{q(1-p)}{p(1-q)+q(1-p)}$.
The Chernoff bound on the probability that the sum is positive is $\exp(-2(w-\frac 12)^2tn)$.
$$ \exp(-2\bigg(\frac{q(1-p)}{p(1-q)+q(1-p)} - \frac 12\bigg)^2 \big(p(1-q) + q(1-p)\big) n)$$
is not rigorous, but we need to adjust $n$ by a factor of $1+o(1)$, and we can compute the adjustment with another Chernoff bound.
For $p=0.6, q=0.5$, we get $n \ge 1382$. For $p=0.9, q=0.8$, we get $n \ge 719$.
The Chernoff bound isn't particularly sharp. Comparison with a geometric series with ratio $\frac{w}{1-w}$ gives that the probability that there are more $+1$s than $-1$s is at most
$${nt \choose nt/2} w^{nt/2} (1-w)^{nt/2} \frac {1-w}{1-2w}$$
This gives us nonrigorous bounds of $nt \gt 564.4, n \ge 1129$ for $p=0.6,q=0.5$ and
$nt\gt 145.97, n\ge 562$ for $p=0.9,q=0.8$. Again, these need to be adjusted by a factor of $1+o(1)$ to get a rigorous estimate (determine $n$ so that there are at least $565$ or $146$ nonzero terms with high probability, respectively), so it's not a contradiction that the actual first acceptable $n$ was $569$, greater than the estimate of $562$.
I haven't gone through all of the details, but this shows that the technique I described gets you much closer to the correct values of $n$ than the Hoeffding bound.
Sorry, disregard what is below. The LIL gives $\max_{i \le N} |x_i| \approx \sqrt{2 N \log \log N}$ for infinitely many $N$, but for any particular $N$,
$\max_{i \le N} |x_i|$ should be of the order $\sqrt N$.
If you only care about bounds up to a constant factor, then I think you're after the law of the iterated logarithm (LIL). As Cardinal indicated, it's enough to consider the 1-dimensional problem (if you don't care about losing a factor of 2). Moreover,
$$
\max_i|x_i| \le \max_{i,j} |x_i - x_j| \le 2\max_i |x_i|
$$
and so you may as well consider $\max |x_i|$ instead. By the LIL, $\max_{i \le N} |x_i| \sim \sqrt{2 N \log \log N}$ almost surely.
The same argument works if the steps are distributed on the unit circle, since the LIL doesn't require Gaussian variables.
If you want to try to get the sharp constant, there are also multi-dimensional versions of the LIL available. You can search for them on Google; I don't really know that area...
Best Answer
To answer your questions about median and mode, one can take Alexandre's answer a little further and compute the exact distribution function for the overtake-times.
Note that the overtake-time doesn't depend on $v_1,v_2$ directly, but only on their difference. Call the difference $v$. Now $v$ is the difference of two uniformly distributed random variables on $[0,1]$, so it is supported on $[-1,1]$ with probability density function $1-|v|$. Moreover, since $\theta$ is uniformly distributed we can without loss of generality identify the cases $(v,\theta)$ and $(-v,1-\theta)$ and reduce everything to the following set-up:
Now we can compute the cumulative density function for the overtake-time. Indeed, we have $P(t<T) = P(\theta/v<T) = P(\theta < Tv)$, which we can get by the following integral: $$ P(t<T) = \int_0^1 2(1-v) P(\theta < Tv | v) \,dv. $$ The probability $P(\theta < Tv | v)$ is given by the function $f(\theta,v) = \max(Tv,1)$. Thus for $T\leq 1$, we have $f(\theta,v)=Tv$ for all $v\in[0,1]$, so integrating gives $P(t<T) = T/3$, while for $T\geq 1$, we integrate and find $$ P(t<T) = \int_0^{1/T} 2(1-v)Tv\,dv + \int_{1/T}^1 2(1-v)\,dv = 1-\frac 1T + \frac 1{3T^2}. $$ So in the end the cumulative density function for the overtake-time is $$ P(t<T) = \begin{cases} \frac T3 & T\leq 1, \\ 1 - \frac 1T + \frac 1{3T^2} & T \geq 1. \end{cases} $$ The term $1/T$ in the last expression will give you the infinite mean, since upon differentiating the CDF you'll get a term $1/T^2$, which upon multiplying by $T$ and integrating to get the mean you end up integrating $1/T$ from $1$ to $\infty$.
As for the median, it looks as though any proximity to $\pi/2$ is just a red herring, because solving for $P(t<T) = 1/2$ yields $T=1 + \frac 1{\sqrt{3}} \approx 1.57735\dots$.