[Math] Minimize $ \arg \min_{A} {\| A x – b \|}_{2} $ (Least Squares Minimization on the Matrix)

linear algebramachine learningmatricesregression

I have a machine learning regression problem. I need to minimize

$$\sum_i \|Ax_i-b_i\|_2^2$$

However I am trying to find matrix $A$, not the usual $x$, and I have lots of example data for $x_i$ and $b_i$. In general $A$ has more rows than columns.

Additionally I would like a solution for minimizing the Mahalanobis distance version, where $C$ is the SPSD covariance matrix:

$$\sum_i(Ax_i-b_i)^TC^{-1}(Ax_i-b_i)$$

I'm thinking that my problem can be re-written into the usual least squares problem but not sure if I am doing it correctly.

Best Answer

Yes. Expanding the sum we get

$$\begin{align*}\sum_i \|Ax_i-b_i\|^2 &= \sum_i \sum_j (Ax_i-b_i)_j^2\\ &= \sum_i \sum_j \left(\sum_k A_{jk}x_{ki} - b_{ji}\right)^2\\ &= \sum_i \sum_j \left(\sum_k x^T_{ik} A^T_{kj} - b^T_{ij}\right)^2\\ &= \sum_i \sum_j (x^T A^T_j - b^T_j)_i^2\\ &= \sum_j \|x^T A^T_j - b_j^T\|^2. \end{align*}$$

So you can solve the problem using regular least squares; your matrix $x^T$ has rows equal to your original known vectors $x_i$, the unknown vector $A_j^T$ is the $j$th row of $A$, and $b_1^T$ is the vector containing all of the first entries of the $b_i$, etc.

Related Question