Unable to understand why the worst case of merge sort takes $(n\log_2{(n) – 1}) + 1$ steps

algorithmsasymptotics

I am trying to count the number of steps taken by merge sort in the worst case, but I keep arriving at $(n-1)\log_2{(n)}$ steps for the worst input.

But it seems according to formal texts the correct answer is $(n\log_2{(n) – 1}) + 1$.

This doesn't make sense to me for a number of reasons.

First $\log_2{(n)} – 1$ means that the depth of the merge tree decrease by 1. How is that possible? The number of levels in merge sort are always the same regardless of the worst or best input. Right?

Second, what is the $+1$ at the end. What is that there for ?

Let me show an example that I used for my calculations to illustrate my point:

{7,3,9,5,8,4,10,6}

{3,7} {5,9} {4,8} {6,10}

{3,5,7,9} {4,6,8,10}

{3,4,5,6,7,8,9,10}

At each stage after the first one each merge operations seems to make $n1 + n2 – 1$ comparisions, where $n1$ and $n2$ are the sizes of each list. The depth of the tree does not decrease.

And I have no idea on how that $+1$ is present.

Best Answer

Your result $(n-1)\log n$ and the text's $(n\log n - 1) + 1$ (which is just $n\log n$, but that may be a typo) both have the same asymptotic growth. In algorithmic analysis you would call them both $\Theta(n\log n)$.

The precise totals beyond the asymptotic growth is extremely sensitive to exactly what you consider as a "step" in the algorithm that deserves counting? For example, it could easily make a difference of $n$ whether you consider "sort a one-element list" at the bottom of the call tree to be an operation that deserves counting.


A bit of less shaky ground can be found by counting comparisons. Then each element passes through $\log n$ merge steps, which gives us an $n\log n$ term, since each time an element comes out of a merge step it consumes a comparison. Except, that is, that the last element in each merged list can always be emitted without comparing it to anything, saving $n-1$ comparisons (your can count the total amount of list merges by noticing that each of them reduces the total number of lists you have in play by $1$ -- and you start with $n$ lists and end with a single one).

This analysis ends up with $$n\log n - (n - 1) = n\log n - n + 1 = n(\log n - 1) + 1$$ comparisons in the worst case.


If you're counting comparisons too, then how do you conclude there should be $(n-1)\log n$ of them? In the concrete example you show, you have $n=8$ and need $1+1+1+1+3+3+7=17$ comparisons, but $(n-1)\log n$ is $21$.

Hmm, it seems like you may be thinking the $-1$ in $n_1+n_2-1$ only saves you a comparison per layer, but it's really per list. This that means that each layer where more than one list is produced has some savings that you're not accounting for.

Related Question