Mathematical vs. Computer Science Definition Big $O$

asymptoticscomputer sciencedefinition

In some formal mathematical treatments of asymptotic big $O$ notation, it's usually defined something like this:

Definition (Asymptotic Big $O$ notation): Let $X\subseteq\textbf R$ be a subset of the real line $\textbf R$, and let $f,g:X\to\textbf R$ be real-valued functions. Then $f$ is said to be asymptotically big $O$ of $g$ (and we write $f\in O(g)$, or sometimes $f=O(g)$) iff there exists a positive real number $c>0$ and a real number $M\in\textbf R$ such that for all $x\in X\cap[M,+\infty)$, we have that $|f(x)|\leq c|g(x)|$.

My question is how exactly does this relate to the more informal notion of big $O$ used in computer science, and specifically in the analysis of the run-time of algorithms? Take the binary search algorithm for instance, which people often just say runs in $O(\log(n))$ time…but here, what is the function $f(n)$ in this situation?

Best Answer

Given a positive integer $n\in\textbf Z^+$, consider the list $$1,2,3,...,n$$ comprised of the first $n$ positive integers. You have to guess some positive integer $1\leq x\leq n$ contained somewhere in this list, and whenever you make a guess, you are told one of three things $$1.\text{Your guess is too high :(}$$ $$2.\text{Your guess is too low :(}$$ $$3.\text{Your guess is correct!}$$ How can you guess what $x$ is? Let me go over two possible strategies. The first one is called linear search. Basically, what you do is guess that $x=1$, and if you get it correct then you celebrate and go drink a lot of beer, while if you are told that $1$ is too low, then you guess the next number in the list, i.e. you guess $x=2$, and if this is correct then you celebrate and go drink a lot of beer, while if you are told that $2$ is too low, then you guess the next number $3$, and this process keeps repeating until you guess the correct number (and go celebrate and drink a lot of beer). For each positive integer $n\in\textbf Z^+$, I'm going to let $T_{\text{linear}}(n)$ denote the maximum number of guesses that you need to correctly guess what $x$ is in a list of length $n$ using the linear search algorithm described above. Convince yourself then that the following table of values is indeed accurate: \begin{array}{r|c} n & T_{\text{linear}}(n) \\ \hline 1 & 1 \\ 2 & 2 \\ 3 & 3 \\ 4 & 4 \\ 5 & 5 \\ 6 & 6 \\ 7 & 7 \\ 8 & 8 \\ 9 & 9 \\ \vdots & \vdots \end{array} In short, for all positive integers $n\in\textbf Z^+$, $T_{\text{linear}}(n)=n$.

Now let's consider another strategy called binary search. Here, you start by guessing that $x$ is the "middle term" in the list, say $x=n/2$ if $n$ is even and $x=\frac{n+1}{2}$ is $n$ is odd. If the guess is correct, then you're done, while if the guess is too low then you have basically eliminated $\approx 1/2$ of the possibilities for what $x$ could be, and then you just apply the whole procedure again to the remaining half of the list, and you keep doing this until you guess correctly. For each possible list length $n\in\textbf Z^+$, let $T_{\text{binary}}(n)$ denote the maximum number of guesses that you need to correctly guess what $x$ is in a list of length $n$ using the binary search algorithm described above. In this case, convince yourself that the following table of values is correct:

\begin{array}{r|c} n & T_{\text{binary}}(n) \\ \hline 1 & 1 \\ 2 & 2 \\ 3 & 2 \\ 4 & 3 \\ 5 & 3 \\ 6 & 3 \\ 7 & 3 \\ 8 & 4 \\ 9 & 4 \\ \vdots & \vdots \end{array}

and that in general, for all positive integers $n\in\textbf Z^+$, we have that $T_{\text{binary}}(n)=\lfloor\log_2(n)\rfloor+1$. Below in red is the graph of the $T_{\text{linear}}$ function while in green is the graph of the $T_{\text{binary}}$ function:enter image description here

Overall, it is clear that binary search is a much more efficient algorithm than linear search when it comes to this particular task of guessing a number in a list, especially as the list gets larger and larger (i.e. as $n\to\infty$). Computer scientists will often say that linear search runs in at worst $O(n)$ time (or linear time), while binary search runs in at worst $O(\log(n))$ time (or logarithmic time). This is because, using the notation from the original post, we have $X=\textbf Z^+$, $f(n)=T_{\text{linear}}(n)=n$ or $f(n)=T_{\text{binary}}(n)=\lfloor\log_2(n)\rfloor+1$, and $g(n)=n$ or $g(n)=\log(n)$ respectively, since in the first case we can choose $c:=1$ and $M:=1$ for instance, while in the latter case we can choose $c:=5$ and $M:=4$.

Related Question