Why doesn’t the gamma distribution have the memoryless property

gamma distributionprobability distributionsstatistics

The gamma distribution essentially tells us the probability of $k$ events happening in a given amount of time, $t$.

It seems to me that there are certain examples of the gamma distribution where it behaves memoryless. For example, the probability of 2 customers entering a store in 3 hours GIVEN that no customer has entered in the first hour is equivalent to the probability of 2 customers entering a store in 2 hours.

Is my example properly demonstrating "memoryless-ness"? What would be an example of it having memory using my store scenario?

Best Answer

A continuous random variable $T$ has the "memoryless" property if $$\Pr[T > t+x \mid T > x] = \Pr[T > t].$$ So for instance, if $T$ is a service time, then given that one has waited for more than $x$ units, the amount of additional time to wait does not depend on how much time one has already waited.

The gamma distribution for a positive integer shape parameter $n$ is also known as the Erlang distribution, and models the total amount of time needed to wait to observe $n$ events, where events have independent and identically distributed increments; that is to say, the interarrival time between each event is exponentially distributed. The corresponding stochastic process is what we call a (homogeneous) Poisson point process.

That said, when $n = 1$, the service time is obviously memoryless, since it is exponentially distributed.

When $n > 1$, however, the service time is not memoryless. To understand why, consider an example where, say, $n = 100$. Then $T$ is the total time it takes to observe $100$ events. If the event rate is low, you would expect to wait quite a long while; but if $x$ is sufficiently large (e.g., larger than the expectation of $T$), then chances are you have already seen many of the $100$ necessary events, and you do not have to wait much longer to see the remaining events; clearly, this is not the same as starting over.

Another way to think of it is that you're in a (very) long line at the grocery store. There are $99$ people ahead of you. Each person takes some random exponentially distributed amount of time to check out; suppose on average it is $\lambda = 1$ minute. The total time you have to wait is gamma (Erlang) with shape $n = 100$ and $\lambda = 1$. Then if you have waited already $x = 120$ minutes, the probability that you have to wait at least another $t = 10$ minutes is $$\Pr[T > 130 \mid T > 120] \approx 0.0987092,$$ but $$\Pr[T > 10] \approx 1.$$ That's because in the conditional probability case, you've already waited $120$ minutes and are likely to have seen nearly all of the people in front of you get checked out; whereas $\Pr[T > 10]$ is almost certainly $1$ because in order for $T \le 10$, all $99$ people in front of you have to get checked out in under $10$ minutes.


The issue with your example is in the statement in boldface:

The probability of 2 customers entering a store in 3 hours, given that no customer has entered in the first hour is equivalent to the probability of 2 customers entering a store in 2 hours.

The given condition, that no customer has entered in the first hour, is not the same as saying that the waiting time is over 1 hour. In particular, the event that no customers have arrived in the first hour is a proper subset of the event that the waiting time is over 1 hour, because if only one customer has arrived in the first hour, you still haven't met the stopping condition.

Mathematically, your statement is $$\Pr[T_2 \le 3 \mid X(1) = 0] = \Pr[T_2 \le 2],$$ where $T_n$ represents the total waiting time to observe $n$ customers arriving, and $X(t)$ represents the number of customers arriving up to time $t$. And while your statement is correct, it's not how memorylessness is defined. For $T_2$ to be memoryless, it needs to satisfy $$\Pr[T_2 \le 3 \mid T_2 > 1] = \Pr[T_2 \le 2].$$ That is to say,

Given that a second customer has not arrived within the first hour, the probability that the second customer will arrive within two more hours (i.e. by hour 3) is equal to the probability that two customers arrive within 2 hours.

And this is obviously false for the reason I described above: because exactly one customer could have arrived within the first hour, meaning that only one more customer is needed within the next two hours to meet the stopping condition; whereas the right-hand side probability means you have to wait for two more customers to arrive within two hours.