Good question. Yes there is a simple argument. Let $P_A$ denote the orthogonal projection matrix $A(A'A)^{-1}A'$ and $M_A=I-P_A$. Then
$$(AX-B)'(AX-B) - B'M_AB = (AX-B)'P_A(AX-B) \succeq 0,$$
where $\succeq 0$ means that the left hand side is positive semidefinite. Hence $X=(A'A)^{-1}A'B$ is the minimizer for both norms.
I'm using the minor simplifying assumptions that $A'A$ is invertible. Should that fail then you can achieve the same result using Moore Penrose inverses.
In response to comment:
If $Y-Z$ is positive semidefinite then $v'(Y-Z)v\geq 0$ for all vectors $v$. So if $v$ is an eigenvector of $Z$ normalized to length one and corresponding to an eigenvalue $\lambda$ then $v'Yv\geq v'Zv=\lambda$. Since this is true for all eigenvectors of $Z$, both the largest eigenvalue of $Z$ and the sum of $Z$'s eigenvalues are less than the corresponding quantities of $Y$.
Suppose $A \in \mathbb{R}^{m \times n} $ there is theorem called the Eckart Young Mirsky Theorem. It states the following
$$A = U \Sigma V^{T} $$
with $ U, V^{T} $ being orthogonal matrices and $ \Sigma $ is a diagonal matrices such with singular values. The following is true.
$$A_{k} = \sum_{i=1}^{k} \sigma_{i}u_{i} v_{i}^{t} $$
$$\| A - A_{k} \|_{2} = \| \sum_{i=k+1}^{k} \sigma_{i}u_{i} v_{i}^{t} \| = \sigma_{k+1}$$
Now
$$ \| A^{T}A - X \|_{2} $$
The nearest matrix is rank k approximation under this norm as above then
$$ \| (U\Sigma V^{T})^{T}(U \Sigma V^{T}) - (U_{k}\Sigma_{k}V_{k}^{T})^{T} (U_{k}\Sigma_{k}V_{k}^{T}) \|_{2} $$
$$ \| V\Sigma U U \Sigma V^{T} - V_{k}\Sigma_{k}^{T}U_{k}U_{k}\Sigma_{k}V_{k}^{T} \|_{2} $$
$$ \| V \Sigma^{2} V^{T} - V_{k}\Sigma_{k}^{2}V_{k}^{T} \|_{2} $$
where
$$V_{k} \Sigma_{k}^{2}V_{k}^{2} = \sum_{i=1}^{k} \sigma_{i}^{2} v_{i}v_{i}^{t} $$
Now the expression above showed
$$\| A - A_{k} \|_{2} = \| \sum_{i=k+1}^{k} \sigma_{i}u_{i} v_{i}^{t} \| = \sigma_{k+1}$$
$$ \| A^{T}A -X \| =\| \sum_{i=k+1}^{k} \sigma_{i}^{2} v_{i}v_{i}^{t} \| = \sigma_{k+1}^{2}$$
I believe. This is more or less saying that the solution to the least squares problem is building it iteratively from some eigenspace. That is typically how it works. It doesn't explicitly build the normal equations.
More specifically
$$V\Sigma^{2} V^{T} = V \Lambda V^{T} $$
$$ \| A^{T}A -X \| =\| \sum_{i=k+1}^{k} \lambda_{i} v_{i}v_{i}^{t} \| = \lambda_{k+1} $$
If you follow this as I decomposed the matrix $X$ with $A^{T}A$. All $A^{T}A$ is the nearest matrix made of the eigendecomposition from $X$ Each matrix $A$ is built from a singular value decomposition. The way the SVD works if you read about it is the following.
$$ A^{T}A = (U \Sigma V)^{T} (U \Sigma V^{T}) $$
$$ A^{T}A = ((V^{T})^{T} \Sigma^{T} U^{T} U \Sigma V^{T} $$
by the properties of transposes and unitary matrices
$$ A^{T}A = V \Sigma^{T} \Sigma V^{T} $$
$$ A^{T}A = V \Sigma^{2} V^{T} $$
$$ A^{T}A = V \Lambda V^{T} $$
Meaning the matrix $A$ is $U\Sigma V^{T}$ however the nearest matrix under the 2 norm is a rank k approximation so we truncate it.
So let's say. Technically this matrix can be measured against the matrix X I think.
$$ A = U_{k}\Sigma_{k} V_{k}^{T} $$
Best Answer
I actually came across a different way of proving that $||B_{ls}||_F\le ||B||_F$ where $B_{ls}$ is the least-squares left-inverse and $B$ is any left-inverse, based on Prof. Stephen Boyd's lecture notes. Reproducing the proof below:
Let $U\Sigma V^T$ be the SVD of $A$. Then $B_{ls}=V\Sigma^{-1}U^T$. Define $Z:=B-B_{ls}$
Since both $B$ and $B_{ls}$ are left-inverses of $A$, $ZA=(B-B_{ls})A=0\implies ZU\Sigma V^T=0 \implies ZU=0$ This implies that $ZB_{ls}^T=0$
Hence, $BB^T=(B_{ls}+Z)(B_{ls}+Z)^T=B_{ls}B_{ls}^T+ZZ^T$
$\therefore BB^T-B_{ls}B_{ls}^T$ is a positive semidefinite matrix $\implies$ tr$(BB^T-B_{ls}B_{ls}^T)\ge 0\implies$ tr$(BB^T)\ge$ tr$(B_{ls}B_{ls}^T)$
Since the non-zero eigenvalues of $PQ$ and $QP$ are the same, this also means that tr$(B^TB) \ge$ tr$(B_{ls}^TB_{ls}) \implies ||B||\ge ||B_{ls}||$.
(This also proves indirectly that $||B||_F\ge ||BQ||_F$)