Solved – Maximum Degree of Polynomial Regression

machine learningpolynomialregression

If we have 100 data points and want to perform polynomial regression, the maximum degree of our polynomial is n-1, where n is the number of data points. In this case, the maximum degree would be 99. I am aware that this will almost always overfit the data, but my question is this: why can we only go up to 99? What is to stop us from taking a polynomial of degree, say, 14 million, and fitting the data?

I am aware of the n-1 bound, but haven't found a good explanation online as to why this bound exists.

Best Answer

In short, what you are doing to fit a polynomial regression model like \begin{align*} Y&=\beta_0+\beta_1X+\beta_2X^2+\cdots+\beta_pX^p+\varepsilon\\ &=\mathbf{X}\boldsymbol\beta+\varepsilon \end{align*} is to perform a linear regression model with $p$ variables given by $(X,X^2,\ldots,X^p)$, so your design matrix, from a sample $\{(X_i,Y_i)\}_{i=1}^n$ is $$ \mathbf{X}=\pmatrix{\begin{array}{ccc}1 & X_1 & \cdots & X_1^p\\ \vdots & \vdots &\ddots &\vdots\\ 1 & X_n & \cdots & X_n^p \end{array}}_{n\times(p+1)}. $$ The estimate for $\boldsymbol\beta$ is $$ \hat{\boldsymbol\beta}=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}\mathbf{y}, $$ with $\mathbf{y}=(Y_1,\ldots,Y_n)^T$.

The important point is that $\mathbf{X}^T\mathbf{X}$ is a $(p+1)\times(p+1)$ matrix and that $\mathrm{rank}(\mathbf{X}^T\mathbf{X})=\mathrm{rank}(\mathbf{X})$. Then, if $p=n-1$, $\mathbf{X}^T\mathbf{X}$ has dimension $n\times n$ and its rank is $n$, so no problem, is invertible. But if $p=n$, the dimension of $\mathbf{X}^T\mathbf{X}$ is $(n+1)\times(n+1)$ and the rank remains $n$, so in that case (and also if $p>n$) is not invertible. In other words, linear dependence arises as the result of fitting more variables than sample points you have.