I think your problem is that you're approaching this from the wrong perspective, thinking the proof is meant to be something it really isn't.
This proof is not an example of a calculation you ought to have been able to do for yourself by now. The proof is there in order to give you the result, as a free fact that you can afterwards use in your own calculations, without worrying about remembering how to prove it. What the proof gives you is a rule saying that if a constant is added or subtracted before something is raised to a fixed power, the added or subtracted constant has no effect on the asymptotic growth of the whole.
There's a fair chance that you're not even supposed to be able to figure out constants for yourself, because your teachers are more interested in getting to the point of actually discussing some algorithms, rather than the minor internal details of the notation used for algorithmic analysis -- which is after all just a tool, not the main goal of the course. This may well be why the details of the proof is being given to you, namely that it saves time compared to training you to construct it from thin air. You should then get with the program and just remember the result, not the irrelevant details of the particular proof.
(And don't worry: with just a bit of experience you will be able to reconstruct this proof on the fly. But there's no reason to panic just because you don't already have this experience before you begin getting it. Also, this is not to say that you should ignore the proof. Make sure you understand that it works, if not how anyone invented it in the first place. Your goal with this understanding should be to get an intuitive feel for how asymptotic growth works, not to try to discern a set of mechanical rules for "finding the answer").
Furthermore, when you speak of "the math in calculating Big O", it sounds like you have the (quite common) misconception that estimating the growth rate of a function is a definite procedure that can be followed by the rules to get one unique correct answer -- like, for example, differentiating a function can be done by following a handful of mechanical rules. This is not true. First of all, everything is big-O and big-$\Theta$ of itself, so if your analysis ends up with $(n+23)^2+3\cdot 2^{n+5\log n}$ it is always correct to call that simply
$$ \mathcal O\bigl((n+23)^2+3\cdot 2^{n+5\log n}\bigr)$$
However, there are simpler descriptions of this that are also correct, and your goal would generally be to find as simple a description as you can -- within the resources (time, knowledge) available to you! -- in the hope of getting something that is easy to compare to other such estimates. But how much time to spend polishing the asymptotic notation is not an exact science. For example, if only a small bit of simplification will tell you what you need (the algorithm you're considering is hopelessly worse than the one you already know), there's no point in wasting time to carry the further simplification as far as you possibly can.
It is also often true in practice that you don't want to spend a lot of time proving a super-tight asymptotic bound for your algorithm, if you can quickly prove a slightly looser bound that nevertheless shows that it is good for your purpose.
Relax. Remember the result. Try to understand the growth classes intuitively, given the rules about them you're shown. Remember that this proof was in the book for a reason. If, instead, it had been an exercise, you would have been justified in worrying that you couldn't solve it. But it wasn't.
Your translation is correct. The intuition behind big-oh notation is that $f$ is $O(g)$ if $g(x)$ grows as fast or faster than $f(x)$ as $x \rightarrow \infty$. This is used in computer science whenever studying the time complexity of an algorithm. Specifically, if we let $f(n)$ be the run-time (number of steps) that an algorithm takes on an $n$ bit input to give an output, then it may be useful to say something like $f$ is $O(n^2)$, so we know that the algorithm is relatively fast for large inputs $n$. On the other hand, if all we knew was $f$ is $O(2^n)$, then $f$ might run too slowly for large inputs.
Note I say "might" here, because big-oh only gives you an upper bound, so $n^2$ is $O(2^n)$ but $2^n$ is not $O(n^2)$.
Best Answer
For all $n \ge 1$, we have $2n \le 2n^2$ and $3 \le 3n^2$. Therefore, $7n^2+2n+3 \le 7n^2+2n^2+3n^2 = 12n^2$.