‘deducing’ a bound using the first order taylor series. How to make it more precise

asymptoticsbirthdaycalculustaylor expansion

So, I just saw a ‘proof’ that the generalized birthday problem has a median of C*sqrt(n). Though the probability in question is interesting, this question is more about calculus and maybe asymptotics than probability. I have a proof outline that I want to make more precise

The context:
Take

$$p_r = (n)_r/n^r$$

(this is meant to be the odds that, in a world with n days, where n >= 365, we get r people at random and none shares a birthday. $(n)_r = n!/(n-r)!$)

We can write p_r as $(1-1/n)\cdot(1-2/n)\cdot…(1-(r-1)/n)$

Now, I want to find the smallest r such that p_r > ½ (this is a median time for a birthday ‘collision’)

The ‘proof’ (I am trying to expand an argument from Feller’s An introduction to probability theory chap 2, section 2, around formula 7.4)

say
$$ ½ = p_r $$
take -log from both sides

-log(½) = -log(p_r)
= -1(log(1-1/n)+log(1-2/n)+…log(1-(r-1)/n)

Using the taylor series for log(1-x)

f(0) + f’(0) * x/1! + error

log(1-x) = log(1) + f’(0) * x + error
f’(x) = 1/(1-x)*(-1) = (x-1)^(-1) => f’(0) = -1

log(1-x) = - x + error

So we get

-log(½) =  -log(p_r)
= -1(log(1-1/n)+log(1-2/n)+...log(1-(r-1)/n)
= -1(-1/n-2/n…-(r-1)/n)
= 1/n(r*(r-1)/2)
~= r^2/2n

Therefore

log(2) ~= r^2/2n
sqrt(2*n*log(2)) ~= r

The weird step for me is the ignoring of the taylor series error. I could bound it using the second derivative of log(1-x), but this would be tedious (I did try).

My question:
We did take the first term of the taylor series, did some calculations with it and got a result, with an unbounded error. What is an easy way to bound this error and know how reliable is the result?

In this question, I am more interested in techniques (if there are any) to easily bind the taylor series error than in other techniques to bypass the taylor series and solve the problem.

Question focus: The main goal of the question is to find a way to actually use the taylor expansion. Is there a precise and formal way to do so that is also easier than doing all the inequations with the second order error term?

Best Answer

The first step is to not use Taylor series but to use a powerful technique called smoothing. This technique is heuristic; the more creative you are, the stronger results you may be able to obtain using it. For bounding the relevant probability for the birthday problem, the following smoothing suffices: $ \def\lfrac#1#2{{\large\frac{#1}{#2}}} $

$a·b ≥ (a-t)·(b+t)$ for every $a,b,t∈ℝ^+$ such that $a ≤ b$.

Using this alone, we can pair up the terms in the product (first with last and so on) and obtain both lower bound and upper bound for them:

$1·(1-\lfrac{r}{n}) ≤ (1-\lfrac{k}{n})·(1-\lfrac{r-k}{n}) ≤ (1-\lfrac{r}{2n})^2$

It is now easy to obtain explicit bounds on the original product:

$(1-\lfrac{r}{n})^{(r-1)/2} ≤ p_r ≤ (1-\lfrac{r}{2n})^{r-1}$.

You can now employ explicit inequalities for $\exp,\ln$:

$1+x ≤ \exp(x)$ for every $x∈ℝ$.
$x-x^2 ≤ \ln(1+x)$ for every $x∈ℝ_{≥-1/2}$.

To get for $r ≤ \lfrac12 n$:

$\exp(\lfrac{r-1}{2}{·}(-\lfrac{r}{n}-\lfrac{r^2}{n^2})) ≤ p_r ≤ \exp(-\lfrac{r}{2n})^{r-1}$.

You can hence bound $p_r$ within a factor of $\exp(-\lfrac{(r-1)·r^2}{2n^2}) ≈ 1$ for $r^3 ≪ n^2$. For large $n$, we have $√(2n)^3 ≪ n^2$ and so the error factor tends to $1$ for the $r$ that makes $p_r$ closest to $\lfrac12$.