Solved – Why is the use of high order polynomials for regression discouraged

interpolationpolynomialregressionsplines

I've read many times on this site that high order polynomials (generally more than third) shouldn't be used in linear regression, unless there is a substantial justification to do so.

I understand the issues about extrapolation (and prediction at the boundaries).

Since extrapolation isn't important to me…

  1. Are high order polynomials also a bad way of approximating the underlying function within the range of the data points? (i.e. interpolation)
  2. If so, what problems are arising?

I don't mind being redirected to a good book or paper about this.
Thanks.

Best Answer

I cover this in some detail in Chapter 2 of RMS. Briefly, besides extrapolation problems, ordinary polynomials have these problems:

  1. The shape of the fit in one region of the data is influenced by far away points
  2. Polynomials cannot fit threshold effects, e.g., a nearly flat curve that suddenly accelerates
  3. Polynomials cannot fit logarithmic-looking relationships, e.g., ones that get progressively flatter over a long interval
  4. Polynomials can't have a very rapid turn

These are reasons that regression splines are so popular, i.e., segmented polynomials tend to work better than unsegmented polynomials. You can also relax a continuity assumption for a spline if you want to have a discontinuous change point in the fit.

Related Question