The Garey and Johnson book (Computers and Intractibility: A Guide to the Theory of NP-Completeness, 1979) has a detailed explanation of this, early on. I can't recommend it enough.
There are several reasons why problems are considered "tractable" if they have polynomial algorithms, but the most important is this: Suppose you have a computer and with it you can feasibly calculate solutions to the Widget Problem for up to 10,000 widgets. Your customer, however, needs to solve larger instances, and if you can help them, they will pay you a lot of money. To help them do this, you will invest some of the fee in upgrading your computer to the latest model, which is twice as fast as your current computer.
If your algorithm for solving the Widget Problem is $O(n)$, the new computer will solve instances involving up to 20,000 widgets, a major improvement.
If your algorithm for solving the Widget Problem is $O(n^2)$, the new computer will make it feasible to solve instances involving up to 14,142 widgets, also a major improvement.
Even if your algorithm is $O(n^3)$, it will be feasible to solve instances involving 12,599 widgets, still a significant improvement.
But if your algorithm is $O(2^n)$, the new computer will enable you to solve instances of the Widget Problem involving up to 10,001 widgets, which is hardly an improvement at all.
That is the difference between polynomial and exponential algorithms.
You are correct that there is not exactly a stark divide between polynomial and exponential algorithms. A polynomial-time algorithm that takes $153672n^{537}$ time is probably less useful in practice than an exponential one that takes $\frac{1.00001^n}{153672}$ time. But the sorts of algorithms that actually appear in practice are not usually $O(n^{537})$; they are usually $O(n^3)$ or better. And also, regarding your question 2, any polynomial algorithm will outperform any exponential algorithm for sufficiently large instances of the problem.
But yes, there are real examples of nominally exponential-time algorithms that are efficient enough to be useful in practice. The "simplex" algorithm for integer linear programming problems is one well-known example.
Best Answer
The trick to reducing any NP problem (including NP-complete problems) to SAT is
For example, to convert integer factorization to SAT, you would write a routine that multiplies two $n$-bit multipliers producing a 2$n$-bit result. Convert the routine to a circuit, then convert the circuit to CNF. Then add simple declarative CNF clauses that force the 2$n$ output variables of the circuit to match the bits of the integer that you want to factor. The resulting CNF instance will be satisfiable only if a bit pattern for the two $n$-bit numbers exists that when multiplied together produces the 2$n$-bit number you want factored. (If you want trivial solutions excluded then you also need to add two extra CNF clauses that demand the most significant $n$-1 bits of each multiplier are not all zero.)
A problem like SUBSET SUM is a bit more intricate. You need to write $n$ routines, each of which returns a particular one of the $n$ integers of the problem set or zero depending on the setting of a parameter bit. These routines feed input into another routine you wrote that sums all $n$ of its inputs and outputs the result. Convert all this into a circuit whose inputs are the parameter bits and whose outputs are the result of the addition routine. Convert the circuit to CNF. Add declarative CNF clauses that force the output variables of the adder routine to match the bits of $k$, the number a solution must sum to. The resulting CNF instance will be satisfiable only if there exists a bit pattern of the parameter bits selecting a subset of the problem set that sums up to $k$.
To show a reduction from SAT to some other problem, I'll copy an answer I gave on the CS stack which shows how to reduce 3-SAT to SUBSET SUM. (See Richard Karp's seminal paper "Reducibility Among Combinatorial Problems" for many more reductions between NP-complete problems.)
The trick to the reduction is to use numbers to encode statements about the 3CNF formula, crafting those numbers in such a way that you can later make an arithmetic proposition about the numbers that is only true if the original 3CNF formula is satisfiable. The reduction below is lifted directly from the lecture notes found at https://people.clarkson.edu/~alexis/PCMI/Notes/lectureB07.pdf .
The $s$ value is crafted the same way for each clause; put a one in the digit position corresponding to that clause, and zeroes everywhere else. The $t$ value will be the same as the $s$ value for each clause.
The $k$ value is always 1111... followed by 33333.... where the number of ones is the same as the number of distinct variables in the formula and the number of threes is the number of clauses in the formula. Note that the required sum $k$ has a one in each digit position corresponding to the variables. This means that any solution to the subset sum problem can include only encoded statements about either a positive instance of the variable or a negative instance in each clause, not both. Note also that sum $k$ has a three in the digit position corresponding to each clause. The $s$ and $t$ values for each clause will sum to two, but to complete the sum a third one will have to come from one of the $y$ or $z$ values. All three ones could come from the $y$ and $z$ values, but the fact that $s$ and $t$ will only sum to two for any clause guarantees that any empty clause in the 3CNF formula forces the subset sum problem to become unsatisfiable.
The lesson you should take from all this is that when reducing one NP-complete problem to another the general procedure is to construct gadgets in the target problem that mimic constraint features of the source problem and then use those gadgets to construct a target problem instance that has a solution only if the source problem instance has a solution.