Choosing sample points for polynomial regression to improve accuracy

machine learningregression

An online set of slides [ link, start on page 4] describes the following situation: we're doing polynomial regression using the model

$$\texttt{prediction}(x) := w_0 + w_1x + \dotsb + w_kx^k + \epsilon \qquad \epsilon \sim N(0, \sigma^2)\quad (\sigma^2 \text{ known})$$

And we assume that the function that generated the training data (call it $y(x)$) actually is a function of the above form. We consider which points $x_1, \dotsc x_m$ are the best values to sample for training such that we can improve the accuracy of the optimal guess $(X^TX)^{-1}X^T \mathbf{y}$.

$$\text{Here } X:= \begin{bmatrix}1 & x_1 & x_1^2 &
\dotsb & x_m^k\\ \vdots \\ 1 & x_m & x_m^2 & \dotsc & x_m^k\end{bmatrix}$$

He describes choosing $x_i$ to minimize $\det \left((A^TA)^{-1} \right)$ since the variance of the guess depends on that quantity. The slides don't go into a ton of depth, but he mentions using "sequential selection" to choose the next sample point in an online learning method.

Specific question: I tried to follow up on this by searching on Google Scholar and Google for "sequential selection" "polynomial regression" and didn't get anywhere. Is there another more standard term for this?

General question: Where can I read more about how the selection of points affects properties of the feature matrix $X$, and functions of $X$? I'm particularly interested in feature normalization and how it affects the condition number of $X$.

Best Answer

The keyword is "Design of Experiments". What you are describing is a D-optimal design. The question of finding optimal points to do prediction started allegedly 1747 and there exists a ton of literature. Worthwhile mentioning is Pukelsheim. If you are more into computer experiments you might want to look at Fang. A modern overview (which goes beyond classical designs) is freely available at Pronzato/Müller.

If conditioning of the design matrix is an issue instead of using "raw" polynomials, you might want to read about "orthogonal polynomials" in regression .

Related Question