Suppose $A \in \mathbb{R}^{m \times n} $ there is theorem called the Eckart Young Mirsky Theorem. It states the following
$$A = U \Sigma V^{T} $$
with $ U, V^{T} $ being orthogonal matrices and $ \Sigma $ is a diagonal matrices such with singular values. The following is true.
$$A_{k} = \sum_{i=1}^{k} \sigma_{i}u_{i} v_{i}^{t} $$
$$\| A - A_{k} \|_{2} = \| \sum_{i=k+1}^{k} \sigma_{i}u_{i} v_{i}^{t} \| = \sigma_{k+1}$$
Now
$$ \| A^{T}A - X \|_{2} $$
The nearest matrix is rank k approximation under this norm as above then
$$ \| (U\Sigma V^{T})^{T}(U \Sigma V^{T}) - (U_{k}\Sigma_{k}V_{k}^{T})^{T} (U_{k}\Sigma_{k}V_{k}^{T}) \|_{2} $$
$$ \| V\Sigma U U \Sigma V^{T} - V_{k}\Sigma_{k}^{T}U_{k}U_{k}\Sigma_{k}V_{k}^{T} \|_{2} $$
$$ \| V \Sigma^{2} V^{T} - V_{k}\Sigma_{k}^{2}V_{k}^{T} \|_{2} $$
where
$$V_{k} \Sigma_{k}^{2}V_{k}^{2} = \sum_{i=1}^{k} \sigma_{i}^{2} v_{i}v_{i}^{t} $$
Now the expression above showed
$$\| A - A_{k} \|_{2} = \| \sum_{i=k+1}^{k} \sigma_{i}u_{i} v_{i}^{t} \| = \sigma_{k+1}$$
$$ \| A^{T}A -X \| =\| \sum_{i=k+1}^{k} \sigma_{i}^{2} v_{i}v_{i}^{t} \| = \sigma_{k+1}^{2}$$
I believe. This is more or less saying that the solution to the least squares problem is building it iteratively from some eigenspace. That is typically how it works. It doesn't explicitly build the normal equations.
More specifically
$$V\Sigma^{2} V^{T} = V \Lambda V^{T} $$
$$ \| A^{T}A -X \| =\| \sum_{i=k+1}^{k} \lambda_{i} v_{i}v_{i}^{t} \| = \lambda_{k+1} $$
If you follow this as I decomposed the matrix $X$ with $A^{T}A$. All $A^{T}A$ is the nearest matrix made of the eigendecomposition from $X$ Each matrix $A$ is built from a singular value decomposition. The way the SVD works if you read about it is the following.
$$ A^{T}A = (U \Sigma V)^{T} (U \Sigma V^{T}) $$
$$ A^{T}A = ((V^{T})^{T} \Sigma^{T} U^{T} U \Sigma V^{T} $$
by the properties of transposes and unitary matrices
$$ A^{T}A = V \Sigma^{T} \Sigma V^{T} $$
$$ A^{T}A = V \Sigma^{2} V^{T} $$
$$ A^{T}A = V \Lambda V^{T} $$
Meaning the matrix $A$ is $U\Sigma V^{T}$ however the nearest matrix under the 2 norm is a rank k approximation so we truncate it.
So let's say. Technically this matrix can be measured against the matrix X I think.
$$ A = U_{k}\Sigma_{k} V_{k}^{T} $$
Best Answer
If we think of $x \in \mathbb{R}^d$ as a column vector, then $$x^tx = \sum_{j=1}^d x_j^2 = \lvert x \rvert^2.$$ Edit: It seems you may be a bit confused on matrix multiplication. Since $x^t$ is a $1 \times d$ matrix and $x$ is a $d \times 1$ matrix, $x^tx$ will be a $1\times 1$ matrix (ie, a scalar): $$x^tx = (x_1, \ldots, x_d) \begin{pmatrix} x_1 \\ \vdots \\ x_d \end{pmatrix} = \sum_{j=1}^d x_j^2.$$