But I'm not convinced we should always apply this rule any time we cut
into $2n$ intervals. Why not just throw out the uneven weighting and
use a few more sample points? If the weighting is so helpful, why not
use a more complicated weighting (like the various n-point rules
(Newton-Cotes formulas) ... )?
The problem with Newton-Cotes methods of high order is that it inherits the same sort of problems you see with using high-order interpolating polynomials. Remember that the Newton-Cotes quadrature rules are based on integrating interpolating polynomial approximations to your function over equally spaced points.
In particular, there is the Runge phenomenon: high-order interpolating functions are in general quite oscillatory. This oscillation manifests itself in the weights of the Newton-Cotes rules: in particular, the weights of Newton-Cotes quadrature rules for 2 to 8 points and and 10 points (Simpson's is the three-point rule) are all positive, but in all the other cases, there are negative weights present. The reason for insisting on weights of the same sign for a quadrature rule is the phenomenon of subtractive cancellation, where two nearly equal quantities are subtracted, giving a result that has less significant digits. By ensuring that the all weights have the same sign, any cancellation that may occur in the computation is due to the function itself being integrated (e.g. the function has a simple zero within the integration interval) and not due to the quadrature rule.
The approach of breaking up a function into smaller intervals and applying a low-order quadrature rule like Simpson's is effectively the integration of a piecewise polynomial approximation. Since piecewise polynomials are known to have better approximation properties than interpolating polynomials, this good behavior is inherited by the quadrature method.
On the other hand, one can still salvage the interpolating polynomial approach if one no longer insists on having equally-spaced sample points. This gives rise to e.g. Gaussian and Clenshaw-Curtis quadrature rules, where the sample points are taken to be the roots of Legendre polynomials in the former, and roots (or extrema in some implementations) of Chebyshev polynomials in the latter. (Discussing these would make this answer too long, so I shall say no more about them, except that these quadrature rules tend to be more accurate than the corresponding Newton-Cotes rule for the same number of function evaluations.)
...is Simpson's rule so useful that calculus students should always use it for approximations?
As with any tool, blind use can lead you to a heap of trouble. In particular, we know that a polynomial can never have horizontal asymptotes or vertical tangents. It stands to reason that a polynomial will be a poor approximation to a function with these features, and thus a quadrature rule based on interpolating polynomials will also behave poorly. The piecewise approach helps a bit, but not much. One should always consider a (clever?) change of variables to eliminate such features before applying a quadrature rule.
Let $F$ be an anti-derivative of $f$, $F'=f$. W.l.o.g. $x_1=0$, set $x=h$ then we are interested in the error expression
$$
g(x)=F(x)-F(-x)-\frac{x}3(f(x)+4f(0)+f(-x)).
$$
This has derivatives
\begin{alignat}{2}
g'(x)&=\frac23(f(x)-2f(0)+f(-x))&&-\frac x3 (f'(x)-f'(-x))
\\
g''(x)&=\frac13(f'(x)-f'(-x))&&-\frac x3(f''(x)+f''(-x))
\\
g'''(x)&=&&-\frac x3(f'''(x)-f'''(-x)).
\end{alignat}
Consequently, by the extended mean value theorem
$$
\frac{g(x)}{x^m}
=\frac{g'(x_1)}{mx_1^{m-1}}
=\frac{g''(x_2)}{m(m-1)x_2^{m-2}}
=\frac {g'''(x_3)}{m(m-1)(m-2)x_3^{m-3}}
$$
with $0<x_3<x_2<x_1<x$.
Using $m=5$ this gives
$$
\frac{g(x)}{x^5}=\frac{g'''(x_3)}{60x_3^2}=-\frac1{90}·\frac{f'''(x_3)-f'''(-x_3)}{2x_3}=-\frac1{90}·f^{(4)}(x_4).
$$
with $|x_4|<x_3<x$.
This results in the error formula
$$
g(x)=-\frac1{90}·f^{(4)}(x_4)·x^5.
$$
or after translating the initial simplifications back,
\begin{align}
\int_{x_0}^{x_2}f(x)\,dx
&=\frac{h}{3}[f(x_0)+4f(x_1)+f(x_2)]-\frac{h^5}{90}f^{(4)}(\xi)
\end{align}
with $\xi\in(x_0,x_2)$
Best Answer
Let $F$ be an anti-derivative of $f$, $F'=f$. W.l.o.g. $a+b=0$, set $x=(b-a)/2$ then we are interested in the error expression $$ g(x)=F(x)-F(-x)-\frac{x}3(f(x)+4f(0)+f(-x)). $$ This has derivatives \begin{alignat}{2} g'(x)&=\frac23(f(x)-2f(0)+f(-x))&&-\frac x3 (f'(x)-f'(-x)) \\ g''(x)&=\frac13(f'(x)-f'(-x))&&-\frac x3(f''(x)+f''(-x)) \\ g'''(x)&=&&-\frac x3(f'''(x)-f'''(-x)). \end{alignat} Consequently, by the extended mean value theorem $$ \frac{g(x)}{x^4}=…=\frac{g'''(x_3)}{24x_3}=-\frac{f'''(x_3)-f'''(-x_3)}{72} $$ which gives an error bound of $$ |g(x)|\le\max_{s\in[a,b]}|f'''(s)|·\frac{x^4}{36}=\max_{s\in[a,b]}|f'''(s)|·\frac{(b-a)^4}{576}. $$
Of course, the more common error term is obtained via $$ \frac{g(x)}{x^5}=\frac{g'''(x_3)}{60x_3^2}=-\frac1{90}·\frac{f'''(x_3)-f'''(-x_3)}{2x_3}=-\frac1{90}·f^{(4)}(x_4). $$
$x_3$ above is the middle point in the third iterative application of the mean value theorem. As $g(0)=g'(0)=g''(0)=g'''(0)=0$ you get for $m\ge 3$ $$\frac{g(x)}{x^m}=\frac{g'(x_1)}{mx_1^{m-1}}=\frac{g''(x_2)}{m(m-1)x_2^{m-2}}=\frac {g'''(x_3)}{m(m-1)(m-2)x_3^{m-3}}$$ with $0<x_3<x_2<x_1<x$.