I need to do a linear model identification using least squared method.
My model to identify is a matrix $A$. My linear system is:
$[A]_{_{n \times m}} \cdot [x]_{_{m \times 1}} = [y]_{_{n \times 1}}$
where $n$ and $m$ define the matrix sizes. in my notation I define my known arrays $x$ and $y$ as vectors.
To identify $[A]$ I have a set of $p$ equations:
$[A] \cdot \vec{x}_1 = \vec{y}_1$
$[A] \cdot \vec{x}_2 = \vec{y}_2$
…
$[A] \cdot \vec{x}_p = \vec{y}_p$
knowing that my system is overdetermined ($p>n,m$) and that each pair of $\vec{x}$ and $\vec{y}$, is known, I want to identify my linear model matrix $[A]$ with least squares.
My approach:
I have aranged my known equations like above:
$[A] \cdot [\vec{x}_1\>\vec{x}_2\>…\>\vec{x}_p]=[\vec{y}_1\>\vec{y}_2\>…\>\vec{y}_p]$
My initial linear system becomes a matrix equation:
$[A]_{_{n \times m}} \cdot [X]_{_{m \times p}} = [Y]_{_{n \times p}}$
The problem:
A) Is this the right thing to do to find $[A]$ with the Moore-Penrose inverse of $[X]$?
In the simplest scalar case of $a \cdot x = b$, the different $(x_1, y_1)…(x_p, y_p)$ pairs are arranged in rows instead of columns which makes sense for me:
$[x_1 \> x_2 \> … \> x_p]^T \cdot a = [y_1 \> y_2 \> … \> y_p]^T$
This confuses me.
B) Also is least squares the right approach? I am not constrained by least squares.
Best Answer
Normally when solving a least squares problem of a linear equation is would be of the form $A\,x = b$, with $A\in\mathbb{R}^{n \times m}$ and $b\in\mathbb{R}^{n}$ known and $x\in\mathbb{R}^{m}$ unknown. The least squares solutions for $x$ minimizes $\|A\,x-b\|$ or equivalently $\left(A\,x-b\right)^\top \left(A\,x-b\right)$. This has the solution $x=\left(A^\top A\right)^{-1} A^\top b$.
Your problem can also be formulated into this form by using vectorization and the Kronecker product. Namely $\mathcal{A}\,\mathcal{X} = \mathcal{Y}$, with $\mathcal{A}\in\mathbb{R}^{n \times m}$ unknown and $\mathcal{X}\in\mathbb{R}^{m \times p}$ and $\mathcal{Y}\in\mathbb{R}^{n \times p}$ known, can also be written as
$$ \underbrace{\left(\mathcal{X}^\top \otimes I\right)}_A \underbrace{\mathrm{vec}(\mathcal{A})}_x = \underbrace{\mathrm{vec}(\mathcal{Y})}_b, \tag{1} $$
with $I$ an identity matrix of $n \times n$. This can now be solved using the normal least squares solution, after which the solution for $\mathrm{vec}(\mathcal{A})$ can be reshaped to a matrix of appropriate size.
However when comparing this to your direct solution using the Moore-Penrose inverse directly on $\mathcal{X}$ it does seems to yield the same solution as with $(1)$. However it is not obvious to me that this should be the case, since the squares one wants to minimize can be formulated as $\mathrm{Tr}\left((\mathcal{A}\,\mathcal{X} - \mathcal{Y})^\top (\mathcal{A}\,\mathcal{X} - \mathcal{Y})\right)$ but the partial derivative with respect to $\mathcal{A}$ would also be annoying to formulate, since it is a matrix.