Let $X_T$ is a number of events in Poisson process of unitary rate ($\lambda = 1$) within interval of length $T$. It is known that at least one event has been observed in the interval, I want to find probability that there is more events in the interval.
My intuition is that $\Pr(X_T > 1 \mid X_T > 0) = \Pr(X_T > 0)$.
The rationale behind is that
-
if the observed event was at time $t$ from the beginning of the interval, then it is enough to calculate probability that no event occurred in either $(0, t)$ or $(t, T)$ open intervals: $\Pr(X_T = 1 \mid X_T > 0) = \Pr(X_t = 0) \Pr(X_{T-t} = 0) = e^{-t} e^{t – T} = e^{-T} = \Pr(X_T = 0)$,
-
$\Pr(X_T > 1 \mid X_T > 0) = 1 – \Pr(X_T = 1 \mid X_T > 0) \\
\Pr(X_T > 0) = 1 – \Pr(X_T = 0) .$
However
\begin{align}
& \Pr(X_T > 1 \mid X_T > 0) = \frac{\Pr(X_T > 1, X_T > 0)}{\Pr(X_T > 0)} = \frac{\Pr(X_T > 1)}{\Pr(X_T > 0)} \\[10pt]
= {} & \frac{1 – \Pr(X_T \in \{1, 0\})}{1 –
\Pr(X_T = 0)} = \frac{1 – Te^{-T} – e^{-T}}{1 – e^{-T}} ,
\end{align}
which neither I nor WolframAlpha can prove equal to $\Pr(X_T > 0) = 1 – e^{-T}$.
Since both results can not be true – where is my mistake?
I can see that $X_T > 1$ and $X_T > 0$ are heavily dependent. Does it matter? My intuition is that $X_T > 0$ is just narrowing the sampling space…
[EDIT #1]
I have found one more way to support… both results.
If $t$ is time of the first event in the interval (from the beginning of the interval), its distribution density will be given as $\operatorname{pdf}(t) = \frac{e^{-t}}{1 – e^{-T}}.$ Then $$\Pr(X_T = 1 \mid X_T > 0) = \int_0^T \operatorname{pdf}(t) \Pr(X_{T-t} = 0) \, dt = \int_0^T \frac{e^{-t}e^{t – T}}{1 – e^{-T}} \, dt = T\frac{e^{-T}}{1 – e^{-T}} .$$
However if I repeat similar steps for uniformly distributed ($\operatorname{pdf}(t) = \frac 1 T$) random event in the interval and take into account also events before $t$ I still get
$$\Pr(X_T = 1 \mid X_T > 0) = \int_0^T \operatorname{pdf}(t) \Pr(X_t = 0) \Pr(X_{T-t} = 0) \, dt = \int_0^T \frac{e^{-T}} T \, dt = e^{-T} .$$
[EDIT #2]
Followup due to the comment of @combo (about loss of conditioning in the first approach).
I do not understand, why the conditioning is lost.
Imagine a situation where we create an interval of length $T$ with at least one event of unitary Poisson process in it. Let $Y$ is a random event of unitary Poisson process and $t$ is a random variable uniformly distributed in $(0,~T)$.
Then $(Y~-~t,~Y~-~t~+~T)$ is an interval of length $T$ containing at least one event, at $t$ (uniformly distributed) from the beginning of the interval. From independence of the events, the probability that there is no more events in the interval is $\Pr(X_t == 0)\Pr(X_{T – t} == 0)$, isn't it? And it is given that there was at least one event in the interval.
Why the situation is different when I have an interval of length $T$ which contains at least one event? Time of a randomly chosen event ($t$; from the beginning of the interval) is uniformly distributed, so I see no difference.
Best Answer
I finally figured it out!
According to @combo's advice, I am going to use term "occurrence".
Surprisingly, the first part of the rationale for my intuition was almost correct.
If the observed occurrence was at time $t$ from the beginning of the interval, then it is enough to calculate probability that no occurrence occurred in either $(0, t)$ or $(t, T)$ open intervals: $\Pr(X_T = 1 \mid t) = \Pr(X_t = 0) \Pr(X_{T-t} = 0) = e^{-t} e^{t - T} = e^{-T} = \Pr(X_T = 0)$.
The difference is replacement of $\Pr(X_T = 1 \mid X_T > 0)$ by $\Pr(X_T = 1 \mid t)$, which may be viewed as $\Pr(X_T = 1 \mid X_{dt} = 1)$, where $dt$ is a length of infinitysmall interval $[t - \frac{dt}{2}, t + \frac{dt}{2}]$. Since $\Pr(X_T = 1, t) = e^{-T} dt$, we may see $\operatorname{pdf}(t) = \Pr(X_T = 1 \mid t)$ as a density in point $t$ of the only occurrence within the $(0, T)$ interval.
Since the value of $t$ is unknown, we integrate $\int_0^T \operatorname{pdf}(t) dt = Te^{-T} = \Pr(X_T = 1)$. So far so good - we know the unconditional probability of having exactly one occurrence in the interval. However by conditioning on $X_T > 0$ the $e^{-T}$ fraction of the sampling space has been discarded - thus it is necessary to normalize every remaining probability/density by a factor of $\frac{1}{1 - e^{-T}}$.
Thus, the intuition was fallacy of mistaking probability with density.