[Math] Give an algorithm that computes a fair driving schedule for all people in a carpool over $d$ days

algorithmsgraph theorynetwork-flow

Some people agree to carpool, but they want to make sure that any
carpool arrangement is fair and doesn't overload any single person
with too much driving. Some scheme is required because none of them
goes to work every day, and so the subset of them in the car varies
from day to day.

Let the people be labeled in a set $S = \{p_1,…,p_k\}$. We say that
the total driving obligation of $p_j$ over a set of days is the
expected number of times that $p_j$ would have driven, had a driver
been chosen uniformly at random from among the people going to work
each day. That is, suppose the carpool plan lasts for $d$ days, and on
the $i^{th}$ day a subset $S_i \subseteq S$ of the people go to work.
Then the total driving obligation $\delta_j$ for $p_j$ can be written
as $\delta_j = \sum_{i:p_j\in S_i} \frac{1}{|S_i|}$. Ideally, we'd
like to require that $p_j$ drives at most $\delta_j$ times, but
$\delta_j$ may not be an integer.

A driving schedule is a choice of a driver for each day$-$a sequence
$p_{i_1}, p_{i_2},…,p_{i_d}$ with $p_{i_t}\in S_t-$and that a fair
driving schedule
is one in which $p_j$ is chosen as a driver on at
most $\lceil \delta_j \rceil$ days.

  1. Prove that for any sequence of sets $S_1,…,S_d$, there exists a fair driving schedule.

  2. Give an algorithm to compute a fair driving schedule that is polynomial in $k$ and $d$.

I have a few questions here. My first question has to do with $\delta_j$. Namely, what does the sum truly represent? I don't really understand the notation $i:p_j\in S_i$. Are we summing over all $i$ values, all values in $S_i$, or what?

I'm also confused about number 1. A sequence of sets $S_1,…,S_d$ represents a set of people $S_1$ to $S_d$ through $d$ days. We want to show that there is a fair driving schedule for this set, so we need to show that $p_i$ drives on at most $\lceil \delta_i \rceil$ days for all people $p_i$. However, we're not even told how many people there are in each set, so how can we determine what $\delta_i$ should be?

Again, this confusion may stem from my misunderstanding of the notation in the summation, so if someone could clear that up I would greatly appreciate it.

I'm not really sure how we can efficiently construct an algorithm that computes a fair driving schedule. Among which ideas should I look for insight here? I was thinking this was somewhat reminiscent of network flow problems, but I'm not entirely sure.

Best Answer

There should be always a fair driving schedule since you can model this problem as a flow network. The source is connected with all persons and the capacity of an edge from the source to a person $p_j$ is $\lceil\delta_j\rceil$. The next step is to connect every person with each of his working days. The capacity of each of this edges is $1$. The last step is to connect each day with the sink with capacity $1$.

Applying the max-flow min-cut theorem to our flow network, we can see that the maximal flow of our network is equal to the number of days, since the minimal cut splits our network into the sink and all other nodes.

Explanation: Without loss of generality, we can assume that each person works at least one day and there are no days without workers. Therefore we exclude all cuts running through an edge which connects a person with a day. The cut which splits our network in source and all other nodes is not minimal since $$\sum_j\lceil\delta_j\rceil\ge\sum_j\delta_j= \text{number of days}.$$ The mixed cases are excluded with the combination of the abovementioned arguments.

In conclusion, the maximal flow in out network can be retranslated to our original problem in the following way: If the flow from the node corresponding to person $p$ to the node corresponding to day $t$ is $1$, than $p$ is the driver on $t$.

Note that there are algorithms to determine the maximal flow like the Ford-Fulkerson-Algorithm.

Related Question