But I'm not convinced we should always apply this rule any time we cut
into $2n$ intervals. Why not just throw out the uneven weighting and
use a few more sample points? If the weighting is so helpful, why not
use a more complicated weighting (like the various n-point rules
(Newton-Cotes formulas) ... )?
The problem with Newton-Cotes methods of high order is that it inherits the same sort of problems you see with using high-order interpolating polynomials. Remember that the Newton-Cotes quadrature rules are based on integrating interpolating polynomial approximations to your function over equally spaced points.
In particular, there is the Runge phenomenon: high-order interpolating functions are in general quite oscillatory. This oscillation manifests itself in the weights of the Newton-Cotes rules: in particular, the weights of Newton-Cotes quadrature rules for 2 to 8 points and and 10 points (Simpson's is the three-point rule) are all positive, but in all the other cases, there are negative weights present. The reason for insisting on weights of the same sign for a quadrature rule is the phenomenon of subtractive cancellation, where two nearly equal quantities are subtracted, giving a result that has less significant digits. By ensuring that the all weights have the same sign, any cancellation that may occur in the computation is due to the function itself being integrated (e.g. the function has a simple zero within the integration interval) and not due to the quadrature rule.
The approach of breaking up a function into smaller intervals and applying a low-order quadrature rule like Simpson's is effectively the integration of a piecewise polynomial approximation. Since piecewise polynomials are known to have better approximation properties than interpolating polynomials, this good behavior is inherited by the quadrature method.
On the other hand, one can still salvage the interpolating polynomial approach if one no longer insists on having equally-spaced sample points. This gives rise to e.g. Gaussian and Clenshaw-Curtis quadrature rules, where the sample points are taken to be the roots of Legendre polynomials in the former, and roots (or extrema in some implementations) of Chebyshev polynomials in the latter. (Discussing these would make this answer too long, so I shall say no more about them, except that these quadrature rules tend to be more accurate than the corresponding Newton-Cotes rule for the same number of function evaluations.)
...is Simpson's rule so useful that calculus students should always use it for approximations?
As with any tool, blind use can lead you to a heap of trouble. In particular, we know that a polynomial can never have horizontal asymptotes or vertical tangents. It stands to reason that a polynomial will be a poor approximation to a function with these features, and thus a quadrature rule based on interpolating polynomials will also behave poorly. The piecewise approach helps a bit, but not much. One should always consider a (clever?) change of variables to eliminate such features before applying a quadrature rule.
The answer expands on the comments by Dan Fox.
It is convenient to take $[-1,1]$ as the interval of integration (any other interval is handled by linear transformation). Let $x_1=-1,\dots, x_n=1$ be equally spaced points on this interval. The Newton Cotes formulas (implicitly) do the following:
- Given a function $f$, interpolate $f$ by a polynomial $p$ of degree $n-1$; namely the Lagrange polynomial.
- Find $\int_{-1}^1 p(x)\,dx$, and return this as an approximation to $\int_{-1}^1 f(x)\,dx$
Remarks:
- We do not actually go through these steps explicitly; the computation is done once, for general $f$, and results in a quadrature formula $\approx \sum w_i f(x_i)$ that we use.
- You can already see the pitfall of the method: when $n$ is large, due to Runge's phenomenon $p$ will likely be not very close to $f$ near endpoints.
Since the interpolating polynomial in 1 is unique, when $f$ is itself a polynomial of degree at most $n-1$, we have $f\equiv p$ identically. Therefore, in this case the formula is exact.
When $n$ is odd, we have $\int_{-1}^1 x^n\,dx = 0$ by symmetry. Also, the contribution of $x^n$ to the interpolating polynomial is an odd polynomial, also by symmetry. (E.g., interpolating $x^3$ at $\pm 1$ gives a multiple of $x$.) Therefore, $x^n$ contributes zero to both the quadrature formula, and to the actual integral. Conclusions:
- The $n$-node Newton-Cotes formula is always exact for polynomials of degrees $\le n-1$.
- When $n$ is odd, it is exact for polynomials of degrees $\le n$.
Best Answer
If you are concerned about the precise historical definition, there are closed and open Newton-Cotes formulas.
The closed formulas approximate the integral over an interval $[a,b]$ using the points $x_k = a + k\frac{b-a}{n}$ for $k = 0,1,\ldots,n$, which includes endpoints. The lowest-order closed formula is the trapezoidal rule, corresponding to $n=1$, using step size $h = (b-a)$, and points $x_0 = a$ and $x_1 = b$, providing the approximation
$$\int_a^b f(x) \, dx \approx \frac{h}{2}[f(x_0) + f(x_1)],$$
representing the average of terms arising in left- and right-Riemann sums.
By convention, there is no $0$-th degree closed formula.
The open formulas use the points $x_k = a + k\frac{b-a}{n}$ for $k = 1,\ldots,n-1$, which excludes endpoints. In this case, a trapezoidal approximation arises when $n = 3$, using using step size $h = \frac{b-a}{3}$, and points $x_1 = a + \frac{h}{3}$ and $x_2 = a + \frac{2h}{3}$, providing the approximation
$$\int_a^b f(x) \, dx \approx \frac{3h}{2}[f(x_1) + f(x_2)]$$
When $n = 2$ we have the midpoint rule using step size $h = \frac{b-a}{2}$ and the single point $x_1 = \frac{a+b}{2}$, providing the approximation
$$\int_a^b f(x) \, dx \approx 2hf(x_1) = f\left(\frac{a+b}{2} \right)(b-a)$$
In this way, the midpoint rule is the open Newton-Cotes formula of degree $2$ and is one term of a particular Riemann sum for an integral over a larger interval where $[a,b]$ is a partition subinterval.