Black and white shirts puzzle

probabilitypuzzle

I’m not familiar enough with probability to be able to even begin to approach this myself, but this puzzle has been plaguing me.

Assume I have $m$ black shirts and $n$ white shirts, where $m > n$. Black shirts have durability $d \ge 0$, and white shirts have durability $u \ge 0$. (Feel free to pick better variable names)

Every day, I pick out a random shirt to wear. Once I run out of either color of shirt, I wash all my dirty shirts of both colors and start over. Clean shirts do not get washed. Whenever a shirt gets washed, its durability goes down by one. Immediately after washing, if the durability of a shirt goes below 0, it must be thrown out.

What is the probability on day $x$ that I throw out the last of one color of shirt, having at least one of the other color still available? Or more casually, after how many days is it likely (for some likelihood) that I have thrown out the last of one color of shirt?

What is the probability that I throw out all my white shirts before all my black shirts?

Best Answer

For specific numbers of shirts and durabilities, the problem can be modeled as a finite-state absorbing Markov chain, but the number of states quickly gets out of hand. Say that a shirt has initial durability of $d$. Its durability can be anything from $0$ to $d$ and it can be clean or dirty, giving $2(d+1)$ states. If we have $s$ shirts to begin with, this gives $2^s(d+1)^s$ states. This is an overstatement since we only care how many clean white shirts of durability $a$ there are, not which whites shirts of durability $a$ are clean, but it gives a rough idea. So if we initially have $6$ white shirts of durability $9$ and $4$ black shirts of durability $4$, this gives $20^6\cdot10^4=6.4\times10^{11}$ states.

Say that we've wildly overestimated the number of states and that there are really only a million. We would still need a one million by one million matrix to solve the problem with a Markov chain. This is not practicable.

I've thought about trying to simplify it by just saying that we get a total of $m(d+1)$ white-shirt days and $n(u+1)$ black shirt days, but that doesn't really capture all the intricacies of the problem, because if we happen to pick the same white shirt repeatedly, we wear it out, and the probability of picking a white shirt in the future goes down.

Unless you have only a very few cheap shirts that are don't stand up to washing, I don't think it is practical to try to solve the problem exactly. As far as a general solution in terms of $m,n,d,u$ goes, I'd be really surprised to see one.

I think the best way to do this is by simulation. You might try different values of the variables to see if you notice any patterns.

I wrote a python script to do the simulation. I changed the formulation slightly; instead of "durability" I have "uses", the number of times the shirt can be worn. This is one more than your definition of durability. The uses are shown as blackWear and whiteWear in the script.

from random import choice
from collections import namedtuple
from math import sqrt

Shirt = namedtuple('Shirt', ['color', 'uses'])
colors = ('white', 'black')
whites = 6
blacks = 4
whiteWear = 4
blackWear = 6
trials = 10000
exhaust = {'white':0, 'black':0}

for _ in range(trials):
    cleanShirts = whites*[Shirt('white', whiteWear)]
    cleanShirts.extend( blacks*[Shirt('black', blackWear)])
    shirts  = {'white':whites, 'black': blacks}
    clean  = {'white':whites, 'black': blacks}
    dirtyShirts = []
    while shirts['white'] and shirts['black']:
        while clean['white'] and clean['black']:
            wear = choice(cleanShirts)
            cleanShirts.remove(wear)
            clean[wear.color] -= 1
            dirtyShirts.append(wear)
        for shirt in dirtyShirts:
            washed = Shirt(shirt.color, shirt.uses-1)
            if washed.uses > 0:
                cleanShirts.append(washed)
                clean[washed.color] += 1
            else:
                shirts[washed.color] -= 1
        dirtyShirts = []
    for color in colors:
        exhaust[color] += shirts[color] == 0 

print('trials:', trials)
for color in colors:
    print(color, 'exhausted', exhaust[color])
w = exhaust['white']/trials
variance = w -w*w
sigma = sqrt(variance/trials)
delta = 1.96*sigma
print('95%% confidence: (%.6f, %.6f)' %(w-delta, w+delta))

This produced the output

trials: 10000
white exhausted 8401
black exhausted 1599
95% confidence: (0.832916, 0.847284)

The last line means that we can have $95\%$ confidence that the probability of exhausting the white shirts first lies between the two values shown.

I found this very surprising. You have a total of $24$ wearings of each color possible, and you have more white shirts than black shirts to start, but you run out of white shirts $84\%$ of the time. I've repeated the experiment a few times, with the same result, so I'm confident in its correctness.

It would be interesting to see how the probability changes with different numbers of shirts and uses. You can just change the values for whites, blacks, whiteWear and blackWear at the top of the script.

EDIT

I finally got around to running some more tests. First, I modified the python script, to make it easier to run multiple tests, and to consolidate the output. There are no substantive changes, but I'll post the new script anyway, in case you want to run some more tests of your own.

from random import choice
from collections import namedtuple
from math import sqrt

Shirt = namedtuple('Shirt', ['color', 'uses'])
Result = namedtuple('Result', ['trials', 'whites', 'blacks', 'whiteWear', 
                                                'blackWear', 'prob', 'delta'])
colors = ('white', 'black')

def tests(trials, whites, blacks, whiteWear, blackWear):
    exhaust = 0
    for _ in range(trials):
        exhaust += test(whites, blacks, whiteWear, blackWear)
    w = exhaust/trials  # sample probability of running out of white shirts
    variance = w -w*w
    sigma = sqrt(variance/trials)
    delta = 1.96*sigma
    return Result(trials, whites, blacks, whiteWear, blackWear, w, delta)

def test(whites, blacks, whiteWear, blackWear):
    cleanShirts = whites*[Shirt('white', whiteWear)]
    cleanShirts.extend( blacks*[Shirt('black', blackWear)])
    shirts  = {'white':whites, 'black': blacks}
    clean  = {'white':whites, 'black': blacks}
    dirtyShirts = []
    while shirts['white'] and shirts['black']:
        while clean['white'] and clean['black']:
            wear = choice(cleanShirts)
            cleanShirts.remove(wear)
            clean[wear.color] -= 1
            dirtyShirts.append(wear)
        for shirt in dirtyShirts:
            washed = Shirt(shirt.color, shirt.uses-1)
            if washed.uses > 0:
                cleanShirts.append(washed)
                clean[washed.color] += 1
            else:
                shirts[washed.color] -= 1
        dirtyShirts = []
    return shirts['white'] == 0 

def report(results):
    for result in results:
        print('%6d %6d %6d %5d %5d %11.5f %.6f'%result)
    print()

trials = 10000
whiteWear = 4
blackWear = 6
results = [ ]
print('Trials Whites Blacks W_Use B_Use Probability    Delta\n')

for n in range (1,5):
    whites = 3*n
    blacks = 2*n
    results.append(tests(trials, whites, blacks, whiteWear, blackWear))
report(results)

results = []
for n in range(2,6):
    whites = 2*n
    blacks = n
    results.append(tests(trials, whites, blacks, whiteWear, blackWear))
report(results)

results = []
for n in range (2,7):
    whites = n
    blacks = 2
    results.append(tests(trials, whites, blacks, whiteWear, blackWear))
report(results)

In all the tests that I ran, I used $10,000$ trials, and I kept the durability at $6$ wearings for a black shirt, and $4$ for a white. Here are the results:

Trials Whites Blacks W_Use B_Use Probability    Delta

 10000      3      2     4     6     0.71320 0.008864
 10000      6      4     4     6     0.83970 0.007191
 10000      9      6     4     6     0.89370 0.006041
 10000     12      8     4     6     0.92770 0.005076

 10000      4      2     4     6     0.63210 0.009452
 10000      6      3     4     6     0.71430 0.008854
 10000      8      4     4     6     0.77090 0.008237
 10000     10      5     4     6     0.81880 0.007550

 10000      2      2     4     6     0.82720 0.007410
 10000      3      2     4     6     0.72210 0.008780
 10000      4      2     4     6     0.63030 0.009461
 10000      5      2     4     6     0.55230 0.009746
 10000      6      2     4     6     0.49690 0.009800

The "Probability" is the sample probability of running out of white shirts first. "Delta" gives the "sampling error", so that in the first case, where we have three white shirts and two blacks shirts, the probability of running out of white shirts first is $0.71320\pm 0.008864$ at the $95\%$ confidence level.

In the first set of trials, I tested what happens if we keep the ratio of white shirts to black shirts the same, but increase the number of shirts. We see that the dominance of quality over quantity becomes more marked, as the number of shirts increases. (I recognize that durability is not the only aspect of quality; this is just verbal shorthand.)

In the next set of trials, I increased the ratio of white shirts to black from $3:2$ to $2:1$. As I expected, the general pattern was the same, though it was more likely that we'd run out of black shirts first.

In the last set of trials, I kept the number of black shirts constant at $2$, but increased the number of white shirts. We had to get to $6$ white shirts until we were about equally likely to run out of black shirts first. It really is too close to call, especially when you take $\delta$ into account. (That The last case is so close to fifty-fifty is purely fortuitous. I just decided to run five cases for no particular reason.)