There are lots of theoretical results telling us that approximation by polynomials works well for various classes of functions, and even telling us what the maximum approximation error will be. For example, there's the Stone-Weiertrass theorem mentioned in the other answer, plus the "Jackson" theorems and many others in constructive approximation:
http://en.wikipedia.org/wiki/Constructive_function_theory
There are fast reliable easy-to-implement algorithms for computing the approximations. For some good examples, look at the Chebfun system, which basically does everything by computing high-degree polynomial approximations: http://www2.maths.ox.ac.uk/chebfun/
Once you have a polynomial, it's relatively easy (and inexpensive) to calculate function values, derivatives, integrals, zeros, bounds, and so on. Again, see Chebfun for examples.
In some fields (like computer-aided design), polynomial forms are considered "standard", and using anything else causes data exchange problems.
Rational approximations will sometimes work better than polynomial ones (in the sense that you get a smaller error with no increase in the degrees of freedom of the approximant). But optimal rational approximations are much harder to compute, and, once you have them, they are more difficult to handle (harder to integrate, for example).
Polynomial approximation is not always the best choice (nothing is), but it's often a pretty good one, and it's a reasonable thing to try unless the nature of your specific problem suggests something different.
First of all, parameters of unweighted linear regression are very strongly influenced by the largest values of $y$, the dependent variable. When you linearize an exponential model and you have very small values, then the $\log(y)$ can be very large (negative) and strongly influence the results.
I made an example fo your problem.
I generated $20$ equally spaced $x$'s $(x_i=1,2,3,\cdots,20)$ and generated the corresponding $y$ using $$y=e^{5.678-0.456 x}$$ to which I added a random noise for a maximum relative error of $\pm10$%.
For the first run, I just linearized the system and used the classical linear regression. The resulting parameters are $5.710$ and $-0.460$. For these values was computed $$SSQ=\sum_{i=1}^{20} \Big(y_i^{calc}-y_i^{exp}\Big)^2$$ which is equal to $87.07$.
In a second step, I used the weighting as suggested in the link you give. The resulting parameters are $5.655$ and $-0.443$ and $SSQ=26.64$.
In a third step, starting with the parameters obtained by the first step, I performed a nonlinear regression. The resulting parameters are $5.656$ and $-0.444$ and $SSQ=26.54$.
As you can see, what is proposed leads to results which are quite close to the nonlinear regression.
I must confess that I always perform a nonlinear regression even if the model can be linearized and even if errors are small. Remember that what is measued is $y$ and not $\log(y)$ and when you compare sum of squares they must be consistent.
By the way, you could demonstrate easily that, if the errors are not too large, minimizing the sum of the squares of logarithms is almost identical to minimizing the sum of squares of relative errors. For illustration purposes, I also made that. The resulting parameters are $5.706$ and $-0.460$ and $SSQ=74.74$. Compare these results to those from the first step.
Best Answer
A direct method of fitting (no guessed initial values required, no iterative process) is shown below. For the theory, see the paper (pp.16-17) : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales