[Math] predicting runtime of $\mathcal{O}(n \log(n))$ algorithm, one “input size to runtime” pair is given

algorithmsasymptoticscomputer sciencelogarithms

I'm given the runtimes for input size $n=100$ of some polynomial-time (big-Oh) algorithms and an $\mathcal{O}(n \log(n))$ one. I want to calculate the runtimes for:
$200$, $1000$ and $10000$.

For the polynomials it is trivial.
The results I should get for the $n \log n$ case are:
$0.06$, $1$ and $20$ seconds respectively, where $0.03s$ was given for $n=100$.

I have tried: (examples should have $0.06$ as solution)

(1) $\frac{f(2x)}{f(x)}$ similarly to how I've got the right solution to the polynomials, $\frac{2x \log(2x)}{x \log x} = \ldots= 2 \log((2x)^{1/\log x})$, not getting anywhere, and substituting $n=100$ gives $2.3010$ for any $\log$ base.

(2) calculating the "unit time of execution" $x$, from $100 x \log(100x)=0.03$
after approximating the inverse of $y=n \log n$ for $y=0.03$,
$n=1.03$ for natural $\log$, substitution gives: $1.48$.
$n=1.02058$ for binary $\log$, substitution gives: $2.10$

(3) calculating "unit time of execution" from the polynomial cases, which yield different for each, and result in negative runtimes predictions for the nlogn algorithm

(?) I also thought of searching for a function that is $\mathcal{O}(n \log n)$ and fits onto the points given by the solution, but thats over my head.

How to get to the solution ?
Did any of my attempts made sense ?

Best Answer

Assuming this is a homework exercise, I think you're making it much more complex than it is.

You're being asked (implicitly) to imagine that an input size of $100$ is enough that all run-time contributions of terms that grow slower than $n\log n$ -- such as time for the program to start up, the $O(n)$ time it takes to read the input, and so forth -- are insignificant compared to the $n\log n$ term. (This is not an unproblematic assumption -- see below -- but that may not be what the question asks you to philosophize about).

In other words, suppose you can ignore everything else. Then the running time is $f(n) = Cn\log n$ for some constant $C$. Given that $f(100)=0.03$, you can find $C$ simply by plugging in $100$ to get $$C\cdot 100\log(100) = 0.03 \quad\text{ and therefore }\quad C = \frac{0.03}{200}=0.00015$$ Now you just use that $C$ to find $f(10000)=0.00015\times 10000\times 4=6$. But that is not the result you're expecting to get -- why would you expect $f(10000)$ to be $20$?

It's not a result of using a base-10 logarithm, because any other logarithm would just show up as a multiplicative factor in $C$ that is canceled out when we plug in concrete numbers later.

You certainly shouldn't expect $f(100)=0.03$ and $f(200)=0.06$, because that amounts only to a linear scaling of the runtime with no logarithmic factor at all. (It's possible to get this behavior with a constant term, of course, but then your problem is underspecified).

So perhaps it's not homework? Are the known good values ones you have measured in practice? In that case it is very likely that what you see is just due to measuring uncertainty -- $0.03$ seconds is a very short time to time a program, and you really have no business trying to extrapolate that by a factor of 100. How precise are your measurements even for such short intervals?

For example, a problem of size 100 will likely stay within the L1 cache whereas for a size of 10000 you begin to pay a price in cache misses that simply doesn't show up at all in the $n=100$ measurement.

Remember that big-O notation is about how the resource usage grows when $n$ is very large, and a nice big-O analysis can hide any amount of shenanigans and erratic behavior for small $n$, as long as there is some size above which the complexity behaves as the big-O notation claims. There's no guarantee about how large this threshold may be; even 10000 may be too small a problem size to see the big-O complexity in action.

Related Question