[Math] Least-squares solution to a matrix equation

least squareslinear algebramatricesregressionstatistics

Suppose I have $n$ observations of $m$ dependent variables $y_1,\dots,y_m$, and I believe they follow some model wherein they can all be written as linear combinations of some underlying variables $x_1,\dots,x_k$ (with $k<m<n$). In other words, I have the model $$Y=X\beta$$ for a (known) $n\times m$ matrix of observations $Y$, an (unknown) $n\times k$ matrix of underlying variables $X$, and an (unknown) $k\times m$ matrix of coefficients $\beta$.

If $n$ is sufficiently large, then this system is over-determined and I should be able to solve for $X$ and $\beta$ that give the least-squares solution to this equation, right? It seems that this should be solvable with something like linear regression but I'm not sure how.

Best Answer

In general $\textbf Y$ and $ \textbf X$ are known because you have a sample. This sample have a dataset of m points: $(x_{11},x_{12},\ldots,x_{1m},y_1), (x_{21},x_{22},\ldots,x_{2m},y_2), (x_{31},x_{32},\ldots,x_{3m},y_3), \ldots, (x_{n1},x_{n2},\ldots,x_{nm},y_m)$. The values of $x_{ij}$ are represented by $\textbf X$ and the values of $y_i$ are represented by $\textbf Y$. The observations are always a pair of an m x-value and a y-value. And it is true, that is has to be $n >m >k$.

You have to minimize $V(\beta)=||\textbf Y-\textbf X\beta||_2^2=(\textbf Y-\textbf X \beta)'\times(\textbf Y-\textbf X\beta)=(\textbf Y'- \beta' \textbf X' )\times(\textbf Y-\textbf X\beta)$

Multiplying out

$V(\beta)=\textbf Y'\textbf Y -\textbf Y'\textbf X\beta-\beta' \textbf X' \textbf Y +\beta' \textbf X' \textbf X\beta$

It is $\textbf Y'\textbf X\beta=\beta' \textbf X' \textbf Y$ Therefore

$V(\beta)=\textbf Y'\textbf Y -2\beta' \textbf X' \textbf Y +\beta' \textbf X' \textbf X\beta$

Differentiating w.r.t $\beta$

$\frac{\partial V}{\partial \beta}=-2 \textbf X' \textbf Y +2 \textbf X' \textbf X\beta=0$

$2\textbf X' \textbf X\beta=2\textbf{X}'\textbf{Y}$

Dividing both sides by 2

$\textbf X' \textbf X\beta=\textbf{X}'\textbf{Y}$

Bringing $\textbf X' \textbf X$ to the RHS.

$\beta=(\textbf X' \textbf X)^{-1}\textbf{X}'\textbf{Y}$

$\beta$ are the values of the coefficients, which minimize the (squared) difference between the observed x-values and the observed y-values.