[Math] The Big O notation what is the explanation

discrete mathematics

I studying discrete mathematics to use it's knowledge further in programming. Unfortunately one of the most complex things is the littlemost of what is explained.

The definition of the theorem is given as:

Let f and g be the functions from the set of integers or the set of 
real numbers to the set of real numbers. 
We say that f(x) is O(g(x)) if there are constants C and k such that  
|f(x)| <= C|g(x)|, when x > k.   

There are confusions here in itself:

I know that the big O notation is used to deduce the number of steps required to solve a problem by an algorithm, and hence can be used to compare two algorithms and find their relative efficiency. So,

1) What are f(x) and g(x), are they the algorithms we are comparing?

2) What is this C and k in the definition?

3)What is the validity of the definition?

Furthermore, it gives an examples like:

Show that f(x) = x2+2x+1 is O(x2)

I apologize, but I've got to write the solution given:

It says,
"We observe that we can readily estimate the size of f(x) when x > 1 whenever x < x2 and 1 < x2 when x > 1. It follows that, 0<= x2+2x+1 <= 4x2, when x > 1. So, we can take c=4 and k = 1 as witnesses."

This solution is all a mystery to me.(And this is not my homework, really).
What I don't understand:

1)Why not take x = 0? We could still estimate the value of f(x), right?

2) "x < x2 and 1 < x2" when x > 1 , where I think it should had been "x <= x2" and where did the second even come from?

3) where did the 0 ≤ x2 +2x+1, even come from?

4) And most importantly How in the world does it even tell me how many steps do I need to take to solve any problem?(f(x) maybe?)

Best Answer

Suppose we have two algorithms each of which sorts an array of length $n$. Suppose the first takes $f(n) = n^2 + n + 1 $ steps and the second takes $g(n) = 1000n \log(n) + 200n$ steps. Which is a better (faster) algorithm?

Well for small values of $n$ the first one is faster, while for large values of $n$, which is what matters in a lot of computing situations, the second one will be faster.

This is the essence of what the "big-O" notation conveys.

Other answers may connect this idea to the formal definition. (I might later today.)

Related Question