Let's say that we have X of shape (2, 5)
and y of shape (2,)
This works: np.linalg.lstsq(X, y)
We would expect this to work only if X was of shape (N,5) where N>=5
But why and how?
We do get back 5 weights as expected but how is this problem solved?
Isn't it like we have 2 equations and 5 unknowns?
How could numpy solve this?
It must do something like interpolation to create more artificial equations?..
Best Answer
My understanding is that numpy.linalg.lstsq relies on the LAPACK routine dgelsd.
The problem is to solve:
$$ \text{minimize} (\text{over} \; \mathbf{x}) \quad \| A\mathbf{x} - \mathbf{b} \|_2$$
Of course, this does not have a unique solution for a matrix A whose rank is less than length of vector $\mathbf{b}$. In the case of an undetermined system,
dgelsd
provides a solution $\mathbf{z}$ such that:Example, if system is $x + y = 1$, numpy.linalg.lstsq would return $x = .5, y = .5$.
How does dgelsd work?
The routine
dgelsd
computes the singular value decomposition (SVD) of A.I'll just sketch the idea behind using an SVD to solve a linear system. The singular value decomposition is a factorization $U \Sigma V' = A$ where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix where the diagonal entries are known as singular values.
The effective rank of matrix $A$ will be the number of singular values that are effectively non-zero (i.e. sufficiently different from zero relative to machine precision etc...). Let $S$ be a diagonal matrix of the non-zero singular values. The SVD is thus:
$$ A = U \begin{bmatrix} S & 0 \\ 0 & 0 \end{bmatrix} V' $$
The pseudo-inverse of $A$ is given by:
$$ A^\dagger = V \begin{bmatrix} S^{-1} & 0 \\ 0 & 0 \end{bmatrix} U' $$
Consider the solution $\mathbf{x} = A^\dagger \mathbf{b}$. Then:
\begin{align*} A\mathbf{x} - \mathbf{b} &= U \begin{bmatrix} S & 0 \\ 0 & 0 \end{bmatrix} V' V \begin{bmatrix} S^{-1} & 0 \\ 0 & 0 \end{bmatrix} U' \mathbf{b} - \mathbf{b} \\ &= U \begin{bmatrix} I & 0 \\ 0 & 0 \end{bmatrix} U' \mathbf{b} - \mathbf{b}\\ \end{align*}
There basically two cases here:
This last part is a bit tricky... need to keep track of matrix dimensions and use that $U$ is an orthogonal matrix.
Equivalence of pseudo-inverse
When $A$ has linearly independent rows, (eg. we have a fat matrix), then: $$A^\dagger = A'\left(AA' \right)^{-1}$$
For an undetermined system, you can show that the pseudo-inverse gives you the minimum norm solution.
When $A$ has linearly independent columns, (eg. we have a skinny matrix), then: $$A^\dagger = \left(A'A \right)^{-1}A'$$