[In the later discussion of LOESS here I attempt to describe LOWESS and its implementation in the R function lowess
as well as outline some of the modifications made for the function loess
(though some details that don't seem to be directly relevant to your questions are omitted).]
In particular: with smoothing splines, how do we choose the number and location of breakpoints
You don't; there's one at every data point; the smoothing parameter is the source of all the regularization. If you want fewer knots, you're talking about penalized splines.
as well as the polynomial degree of the spline?
With smooth.spline
it's always cubic, it says so right in the help.
If you mean the degree of local fit in LOESS (which is not a spline) first see Cleveland [1] (which describes LOWESS, on which LOESS is based) -- it pretty much suggests that 0 isn't flexible enough ("*in the practical situation, an assumption of local linearity serves better than local constancy") and 2 is harder to compute, relative to a smaller gain in flexibility, and it suggests choosing the degree to be 1 as the best compromise in practice.
The suggestions in Cleveland [1] (more details about choosing the various parameters are given in the paper) are the defaults in the R function lowess
(such as degree 1 and span 2/3).
The help on 'loess' says it uses different defaults (degree is 2 and span is 3/4).
And what do the bandwidth arguments control in the function?
As it is described by Bill Cleveland[1], LOESS applies a tricube weight function ($W(x)=((1-|x|^3)_+)^3$) to locally weight the points.
$W$ is scaled so that the $r$-th nearest neighbor is the first to get zero weight, where $r = \text{round}(fn)$ and $f$ is the span argument. If there are multiple predictors this is modified (see the help on loess
).
The loess
function allows you to specify a target number-of-parameters equivalent instead of the span.
Also how does LOESS select outliers for removal?
Again, as described by in Cleveland[1], LOWESS downweights observations with large residuals rather than specifically select and remove them. However, some observations may get zero weight, which means some are effectively removed. Specifically, after an initial fit LOWESS introduces robustness weights based on the residuals from the initial fit. The robustness weights use a biweight function ($B(x)=((1-x^2)_+)^2$); any observation with an absolute residual more than six times the median absolute residual will have zero weight, but points closer than that will still have reduced weight; for example, a point with absolute residual 3.25 times the median absolute residual will have about half weight.
This downweighting process is iterated (that is, residuals are recalculated from a fit using these weights, and the robustness weights recalculated in turn, until convergence). Note that both $W$ and $B$ can downweight a given observation.
The help for the implementation of loess
refers to redescending M-estimation using a biweight function, but that is presumably just being used as a brief way of describing the above scheme rather than doing anything different.
[1] Cleveland, William S. (1979).
"Robust Locally Weighted Regression and Smoothing Scatterplots".
Journal of the American Statistical Association. 74 (368): 829–836.
Best Answer
A bit more modern than what you quote is de Boor, C. (1978) A Practical Guide to Splines, Springer Verlag. An efficient algorithm for smoothing splines is given by Hutchinson, M.F. and de Hoog, F.R. (1985) Smoothing Noisy Data with Spline Functions, Numerische Mathematik, 47, p. 99-106 (see also Hutchinson, M.F. (1986) Cubic Spline Data Smoother, Transactions on Mathematical Software, vol. 12, 150-153; you will find the FORTRAN source of the algortihm in http://calgo.acm.org).
Note also that the Kalman filter can be a good tool to fit some types of splines; see for instance an answer I gave sometime ago on Kalman filter vs. smoothing splines. You will find much relevant information if you search here in CrossValidated using "splines" as a tag.