[Math] Using induction to show a greedy algorithm always makes the optimal task selection

algorithmsdiscrete mathematicsinduction

Suppose we have a greedy algorithm like the following:

procedure selection($s_1, s_2, ...,s_n$, $f_1,f_2,...,f_n$)
S = (empty set)
for j = 1 to n
    if task j is compatible with S
    S = S U {j}
return S

S = tasks selected by algorithm above.
A = tasks selected by optimal algorithm.

Thus, the greedy algorithm is optimal if |S| is always equal to |A|. How can I use induction to show that this is true? Would my base case just be that both trivially sort 1 task?

NOTE: $s_1,s_2,…s_n$ and $f_1,f_2,…,f_n$ signify start and end times, respectively, for the tasks.

Best Answer

Well, I don't want to be picky, I'll just assume that the tasks given are sorted with regard to their ending times, i.e. $f_i \leq f_{i+1}$ (or you could just sort the input).

The first thing you would like to do is to decide why you can use induction. Does the problem contains similar, but smaller subproblems? In our case it does, but it is not clear from the code you have given, i.e. you would need to argue why running last $n-1$ runs of the loop is similar to runing them all (this is kind of trivial, but tedious if it were to be done formally).

To make it easier, consider the following rewrite:

procedure selection($s_i, s_{i+1}, ...,s_n$, $f_i,f_{i+1},...,f_n$)
if n < i
  return (empty set)
else
  j = i+1
  while task j is incompatible with S and j <= n
      j = j + 1
  return {i} U selection($s_j, s_{j+1}, ..., s_n$, $f_j, ..., f_n$)

This recursive version gives a clear indication where the subproblem structure is, and it is much easier to work with. For example you can read-off the base case: it is for an empty input (the first if-clause). You can also guess what your assumption should be: selection works for all $|S| \leq n-j+1$, but you don't know $j$, so it has to work for all $|S| \leq n-1$. All that is left, is to observe that $f_i$ has to be smaller or equal of the finish time of the corresponding task in the optimal solution. From there follows that the remaining set $s_j, \ldots, s_n$ is the largest possible worth considering, and the rest is implied by the step's assumption.

In my timezone it is getting late, so please consider this a huge hint, try to work out the rest on your own and leave a comment whether you succeeded; if not, I shall complete this answer tomorrow.

Related Question