[Math] Please explain why the below algorithm has n*m subproblems rather than 2(n+m-1) subprobems.

dynamic programmingrecursive algorithms

I am providing the solution as well which I found on the web. I don't understand why it says at the end that there are n*m subproblems.

Question:

For bit strings X = x1 . . . xm, Y = y1 . . . yn and Z = z1 . . . zm+n, we say that Z is an interleaving of X and Y if it can be obtained by interleaving the bits in X and Y in a way that maintains the left-to-right order of the bits in X and Y .

For example if X = 101 and Y = 01 then x1 x2 y1 x3 y2 = 10011 is an interleaving of X and Y , whereas 11010 is not. Give the most efficient algorithm you can to determine if Z is an interleaving of X and Y . Prove your algorithm is correct and analyze its time complexity as a function m = |X| and n = |Y |.

Solution :

c[i, j] is a boolean

Let z(1), . . . , z(i+j) is an interleaving of x1, . . . , xi and y1, . . . yj for 0 < i <= m and 0 < j <= n.

Let c[i, j] be true if and only if z(1) . . . , z(i+j) is an interleaving of x(1), . . . , x(i) and y(1), . . . , y(j) .

If i = 0 then x(i) = (the empty string) and if j = 0 then y(j) = (the empty string).
The subproblem c[i, j] can be recursively defined as shown (where c[m, n] gives the answer to the original problem.

i starts from m and j starts from n. We start from the back of the interleaving string i,e, at i+j position and moves towards the 0th position to see if all strings are tested concurrently and are empty.

So,
c[i,j] = true if i = j = 0

c[i,j] = false if x(i) != z(i+j) and y(j) != z(i+j)

c[i,j] = c[i-1, j] if x(i) = z(i+j) and y(j) != z(i+j)

c[i,j] = c[i, j-1] if x(i) != z(i+j) and y(j) = z(i+j)

c[i,j] = c[i-1, j] OR c[i, j-1] if x(i) = y(j) = z(i+j)

PROOF

We now argue this recursive definition is correct.

First the case where i = j = 0 is when both X and Y are empty and then by definition Z (which is also empty) is a valid interleaving of X and Y .

If x(i) != z(i+j) and y(j) == z(i+j) then there could only be a valid interleaving in which xi appears last in the interleaving, and hence c[i, j] is true exactly when z1, . . . , z ( i+j−1) is a valid interleaving of x1, . . . , x(i−1) and y1, . . . yj which is given by c[i , j -1].

Similarly, when xi == zi+j and yj != zi+j then c[i, j] = c[i − 1, j].

If xi = yj = z(i+j), the interleaving (if it exists) must either end with xi (in which case c[i − 1, j] is true) or must end with yi (in which case c[i, j − 1] is true). Thus returning c[i − 1, j] OR c[i, j − 1] gives the correct answer.

Finally, since in all cases the value of c[i, j] comes directly from the answer to one of the subproblems, we have the optimal substructure property.

The time complexity is clearly O(nm) since there are n ·m subproblems each of which is solved in constant time.

Finally, the c[i, j] matrix can be computed in row major order.

Best Answer

We are not doing $m$ subproblems in the first string and then $n$ subproblems in the second string (which is what you're thinking to add, I guess); but for each of the $m$ substrings in the first string, you are doing $n$ subproblems with the second string. So you have $n$ added together $m$ times, which is $mn$.

Related Question