Probability of listening to a song in a shuffled playlist at time $t$

markov-processprobability

Suppose we have a playlist of $N$ songs, with durations $\{t_1, t_2, \dots, t_N\}$. The list is set to shuffle indefinitely. This means that, after each song is finished playing, the next song is chosen uniformly at random, so each song, including the one that just played, has an equal probability of playing, regardless of their duration. When the playlist is started, it chooses a song uniformly at random.

Let $P_j(t)$ be the probability that the playlist would currently be playing song $j$ after time $t$ has passed since beginning the playlist.

Intuitively, I'm sure that in the limit of large $t$, $P_j(t)$ must converge towards $t_j/\sum_i t_i$. But is that convergence asymptotic, or is there a hard cutoff time after which that is the distribution? For example, is this the distribution for all $t > \max(\{t_i\})$? If so, it's not obvious to me why this might be the case.

Also I know that, for all $t < \min(\{t_i\})$ the probability distribution must be uniform.

What is $P_j(t)$ in its general form? (Not just the above limits). Is there an analytic form?


My attempt:

Without loss of generality, order the songs in ascending duration, so $t_1 \le t_2 \le \dots \le t_N$. But with some loss of generality, let's assume that no two songs have equal duration (a reasonable assumption for continuous time), so $t_1 < \dots < t_N$.

As we know, for $t<t_1$, the probability distribution is uniform.

For $t_1 < t < t_2$, the only possible song that could have completed is the first song. Assuming $t < 2 t_1$, $P_1(t) = 1/N^2$. This is because the only way song $1$ can be playing is if song $1$ was rolled twice in a row. For $j \ne 1$, $P_j(t) = \frac{N+1}{N^2}.$ That is, either they were rolled initially and are still playing, or song $1$ was rolled, and then song $j$ was rolled afterwards.

Now, this doesn't cover the general case for $t_1 < t < t_2$, only for $t < 2 t_1.$ However, $t_2$ can be any multiple of $t_1$ in principle. Suppose, for example, that $t_2 = 6.5 t_1$, and $t = 5.5 t_1$. Then the only way song $1$ can be playing at time $t$ is if it was rolled six times in a row. Thus $P_1(t) = 1/N^6.$ And $P_j(t) = \frac{N^5 + N^4 + N^3 + N^2 + N + 1}{N^6}$ for $j \ne 1$. This can go on and on, for $t_2$ being any arbitrary multiple of $t_1$. So it seems like that must be specified:

For $(k-1) t_1 < t < k t_1 \le t_2$,
$$P_1(t) = 1/N^k$$
and
$$P_j(t) = \frac{\sum_{i=0}^{k-1} N^i}{N^k}; \hspace{1mm} j \ne 1.$$

This approach shows why it's problematic to go anywhere beyond $t = t_2$. For any interval $t_j < t < t_{j+1}$, there will be a different discrete expression every time $t$ passes any sum of earlier times. For example, if we're evaluating the probability for $t_8 < t < t_9$, and the sum $t_2 + 2 t_4 + t_7$ falls within this range, then there will be a different expression before and after that time, because there's a chance that the playlist just finished that sequence. Thus, I believe that a closed-form expression for $P_j(t)$ would take the form of an intractable number of disjoint expressions for different ranges, and for arbitrary $\{t_1, \dots, t_N\}$ it might be impossible to even show an expression, because this involves an arbitrary number of disjoint expressions.

If that's true, though, then an answer may still be given for my question to the large-$t$ limit.

Best Answer

One way to obtain the probability would be the following approach:

We want song $j$ to play after time $t$. For this to happen, all songs but $j$ need to have been played whole (i.e. an integer number of times).

Therefore, let $$ S_j(t) = \{(a_1,..,a_N)\mid a_1,..,a_{j-1},a_{j+1},..,a_N\in \mathbb{Z}, a_j\in\mathbb{R}\quad\land\quad \sum_{i=1}^N a_i \cdot t_i = t\} $$ Let $M\in S_j(t)$. Then each distinct permutation of $M$ creates a possible playlist. Since the probability of a song playing is uniform, the probability of the playlist is simply $N^{-\text{"Number of songs in playlist"}}$.

To get $P_j(t)$, we now sum over all possible playlists: $$ P_j(t) = \sum_{(a_1,..,a_N)\in S_j(t)} \binom{a_1+..+a_{j-1}+\lfloor a_j\rfloor+ a_{j+1}+..+a_N}{a_1,..,a_{j-1},\lfloor a_j\rfloor, a_{j+1},..,a_N}{N^{-(a_1+..+a_{j-1}+\lfloor a_j\rfloor+ a_{j+1}+..+a_N+1)}} $$

Another way of obtaining $P_j(t)$ would be as recurrence:

(For this part, let a song with duration $z$ occupy the interval $[c,c+z), c\ge 0$)

If $t\ge t_j$, then $j$ can not be the only song in the playlist - some song has to be played before it (though that one may be $j$ as well).
If $t<t_j$, then it's possible that $j$ is the only song in the playlist, but not necessary - another song $k$ with $t_k\le t$ may still be in the playlist.

These two cases give us the recurrence relation: $$\begin{cases} P_j(t) = \frac1N\sum_{i=1}^N \delta_{t_i\le t}P_j(t-t_i),\qquad &t\ge t_j \\ P_j(t) = \frac1N+\frac1N\sum_{i=1}^N \delta_{t_i\le t}P_j(t-t_i), &t< t_j \end{cases}$$

$\delta_{a<b}$ here stands for the Dirac-Delta, i.e. $$\begin{cases}\delta_{a\le b} = 1, &a\le b\\ \delta_{a\le b} = 0, &\text{ else}\end{cases}$$

The Dirac-Delta isn't necessary; It's purpose here is mainly to make notation more compact.

To remove them, we'd have to add an edge case for each possible value-combination of the Dirac-Deltas (for this, enumerate the songs so that $t_1\le ... \le t_N$)

The advantage of the recurrence relation is that, given a specific playlist, and assuming the songs have a discrete time (e.g. given in whole seconds), we can for each playlist calculate a "closed form" - that is, if we allow the closed form to contain roots (that can not be expressed as radicals).
Otherwise, we can still give an approximating function.


Python Script

Note: If you have less than 4GB free RAM, remove the first two lines.

Related Question