Least squares solution to $A^TA ≃ X$ for non-square $A$ given $X$

linear algebramatrices

I am sure that there is a fairly simple solution based on the Moore-Penrose pseudo-inverse or singular value decomposition, but I cannot seem to make any headway in this problem with these methods. My problem is as follows:

How does one determine the least-squares solution for an unknown non-square matrix $A$ given

$$A^TA ≃ X$$

for a known $X∈ℝ^{n×n}$ and knowing that $A∈ℝ^{m×n}$ where $m≠n$.

The same question has been asked and answered for square A, but I need to know an approximate solution for non-square A.

Solve for A from A x transpose(A)

Hand-holding solutions and/or examples are greatly appreciated, but any references or relevant theorems/keywords are welcome as well. Please let me know if I need to elaborate or make my question more precise.

Thanks in advance for any help!

Edit:
Following the comment by User7530 below, what I mean by least-squares solution is this:
$$min_{A^{m×n}}∥A^TA−X∥^2_F$$

where $∥M∥_F$ is the Frobenius norm.

Best Answer

Suppose $A \in \mathbb{R}^{m \times n} $ there is theorem called the Eckart Young Mirsky Theorem. It states the following

$$A = U \Sigma V^{T} $$

with $ U, V^{T} $ being orthogonal matrices and $ \Sigma $ is a diagonal matrices such with singular values. The following is true.

$$A_{k} = \sum_{i=1}^{k} \sigma_{i}u_{i} v_{i}^{t} $$ $$\| A - A_{k} \|_{2} = \| \sum_{i=k+1}^{k} \sigma_{i}u_{i} v_{i}^{t} \| = \sigma_{k+1}$$

Now

$$ \| A^{T}A - X \|_{2} $$

The nearest matrix is rank k approximation under this norm as above then

$$ \| (U\Sigma V^{T})^{T}(U \Sigma V^{T}) - (U_{k}\Sigma_{k}V_{k}^{T})^{T} (U_{k}\Sigma_{k}V_{k}^{T}) \|_{2} $$ $$ \| V\Sigma U U \Sigma V^{T} - V_{k}\Sigma_{k}^{T}U_{k}U_{k}\Sigma_{k}V_{k}^{T} \|_{2} $$

$$ \| V \Sigma^{2} V^{T} - V_{k}\Sigma_{k}^{2}V_{k}^{T} \|_{2} $$

where

$$V_{k} \Sigma_{k}^{2}V_{k}^{2} = \sum_{i=1}^{k} \sigma_{i}^{2} v_{i}v_{i}^{t} $$

Now the expression above showed

$$\| A - A_{k} \|_{2} = \| \sum_{i=k+1}^{k} \sigma_{i}u_{i} v_{i}^{t} \| = \sigma_{k+1}$$

$$ \| A^{T}A -X \| =\| \sum_{i=k+1}^{k} \sigma_{i}^{2} v_{i}v_{i}^{t} \| = \sigma_{k+1}^{2}$$

I believe. This is more or less saying that the solution to the least squares problem is building it iteratively from some eigenspace. That is typically how it works. It doesn't explicitly build the normal equations.

More specifically

$$V\Sigma^{2} V^{T} = V \Lambda V^{T} $$ $$ \| A^{T}A -X \| =\| \sum_{i=k+1}^{k} \lambda_{i} v_{i}v_{i}^{t} \| = \lambda_{k+1} $$

If you follow this as I decomposed the matrix $X$ with $A^{T}A$. All $A^{T}A$ is the nearest matrix made of the eigendecomposition from $X$ Each matrix $A$ is built from a singular value decomposition. The way the SVD works if you read about it is the following.

$$ A^{T}A = (U \Sigma V)^{T} (U \Sigma V^{T}) $$ $$ A^{T}A = ((V^{T})^{T} \Sigma^{T} U^{T} U \Sigma V^{T} $$ by the properties of transposes and unitary matrices $$ A^{T}A = V \Sigma^{T} \Sigma V^{T} $$ $$ A^{T}A = V \Sigma^{2} V^{T} $$ $$ A^{T}A = V \Lambda V^{T} $$

Meaning the matrix $A$ is $U\Sigma V^{T}$ however the nearest matrix under the 2 norm is a rank k approximation so we truncate it.

So let's say. Technically this matrix can be measured against the matrix X I think.

$$ A = U_{k}\Sigma_{k} V_{k}^{T} $$