Linear model matrix identification with least squares

least squareslinear algebrasystem identification

I need to do a linear model identification using least squared method.
My model to identify is a matrix $A$. My linear system is:

$[A]_{_{n \times m}} \cdot [x]_{_{m \times 1}} = [y]_{_{n \times 1}}$

where $n$ and $m$ define the matrix sizes. in my notation I define my known arrays $x$ and $y$ as vectors.

To identify $[A]$ I have a set of $p$ equations:

$[A] \cdot \vec{x}_1 = \vec{y}_1$

$[A] \cdot \vec{x}_2 = \vec{y}_2$

$[A] \cdot \vec{x}_p = \vec{y}_p$

knowing that my system is overdetermined ($p>n,m$) and that each pair of $\vec{x}$ and $\vec{y}$, is known, I want to identify my linear model matrix $[A]$ with least squares.

My approach:

I have aranged my known equations like above:

$[A] \cdot [\vec{x}_1\>\vec{x}_2\>…\>\vec{x}_p]=[\vec{y}_1\>\vec{y}_2\>…\>\vec{y}_p]$

My initial linear system becomes a matrix equation:

$[A]_{_{n \times m}} \cdot [X]_{_{m \times p}} = [Y]_{_{n \times p}}$

The problem:

A) Is this the right thing to do to find $[A]$ with the Moore-Penrose inverse of $[X]$?

In the simplest scalar case of $a \cdot x = b$, the different $(x_1, y_1)…(x_p, y_p)$ pairs are arranged in rows instead of columns which makes sense for me:

$[x_1 \> x_2 \> … \> x_p]^T \cdot a = [y_1 \> y_2 \> … \> y_p]^T$

This confuses me.

B) Also is least squares the right approach? I am not constrained by least squares.

Best Answer

Normally when solving a least squares problem of a linear equation is would be of the form $A\,x = b$, with $A\in\mathbb{R}^{n \times m}$ and $b\in\mathbb{R}^{n}$ known and $x\in\mathbb{R}^{m}$ unknown. The least squares solutions for $x$ minimizes $\|A\,x-b\|$ or equivalently $\left(A\,x-b\right)^\top \left(A\,x-b\right)$. This has the solution $x=\left(A^\top A\right)^{-1} A^\top b$.

Your problem can also be formulated into this form by using vectorization and the Kronecker product. Namely $\mathcal{A}\,\mathcal{X} = \mathcal{Y}$, with $\mathcal{A}\in\mathbb{R}^{n \times m}$ unknown and $\mathcal{X}\in\mathbb{R}^{m \times p}$ and $\mathcal{Y}\in\mathbb{R}^{n \times p}$ known, can also be written as

$$ \underbrace{\left(\mathcal{X}^\top \otimes I\right)}_A \underbrace{\mathrm{vec}(\mathcal{A})}_x = \underbrace{\mathrm{vec}(\mathcal{Y})}_b, \tag{1} $$

with $I$ an identity matrix of $n \times n$. This can now be solved using the normal least squares solution, after which the solution for $\mathrm{vec}(\mathcal{A})$ can be reshaped to a matrix of appropriate size.

However when comparing this to your direct solution using the Moore-Penrose inverse directly on $\mathcal{X}$ it does seems to yield the same solution as with $(1)$. However it is not obvious to me that this should be the case, since the squares one wants to minimize can be formulated as $\mathrm{Tr}\left((\mathcal{A}\,\mathcal{X} - \mathcal{Y})^\top (\mathcal{A}\,\mathcal{X} - \mathcal{Y})\right)$ but the partial derivative with respect to $\mathcal{A}$ would also be annoying to formulate, since it is a matrix.

Related Question