From what I understand, when doing a least squares regression with a Vandermonde matrix, you're essentially solving the equation
$y=Xa$
Where $y$ is a vector of $y$-values, $X$ is the Vandermonde matrix, and $a$ is a vector of coefficients.
When you solve this equation for $a$,
$a=(X^TX)^{-1}X^Ty$
You get the above expression for $a$.
My understanding is that this should be a solution to the set of equations. However, it is possible to fit $n$ points of data to a $k$-th degree polynomial, where $n>k$. This would imply that there are more equations than unknowns, which results in no possible solutions. However, the vector a can be calculated. This resulting vector does not completely perfectly satisfy
$y=Xa$
But instead is a good approximation, as with regression.
Why can we get a value for $a$? As I do not believe this could be possible if we were dealing with $n$ equations, and $k$ unknowns. Where does the matrix solution differ from solving $k$ unknowns with $n$ equations?
Best Answer
First, a little explanation of what is happening when you solve for $a$ in
$\vec{a}=(X^TX)^{-1}X^T\vec{y}$ .
The $(X^TX)^{-1}X^T$ part is the Moore-Penrose pseudoinverse. It uses $X$'s Gramian matrix (the $X^TX$), which is a square positive semidefinite matrix yielding the same eigenvectors as $X$ while squaring its eigenvalues. Taking the inverse of this Gramian matrix and multiplying it by $X^T$ actually performs a least square fit (see this and this reference). It is a property of the pseudoinverse.
As for your question about dealing with $n$ equations and $k$ unknowns, there are two possible cases (I suppose the rank of your matrix is at least equal to $k$):