Reading through a book, it is mentioned that $||{A}||^2_F=\text{Energy}(A)=\text{tr}(AA^T)=\text{tr}(A^TA)$. I understand that for square matrices the square Frobenius norm would be the the squared sum of all elements within the matrix, but I cannot intuitivley get why for rectangular matrices it would be the trace of the matrix multiplied by its transpose (or the other way around). For instance it would be that $\text{tr}(CD^T) = \text{tr}(DC^T) = \displaystyle\sum_{i=1}^n\sum_{j=1}^dc_{ij}d_{ij}$ for some matrices $D, C$. of size $n \times d$. Maybe some sort of proof would help?
SOURCE: Linear Algebra and Optimization for Machine Learning: A Textbook (page 20)
Best Answer
Note that: $$ \|A\|_F^2 = \sum_{j=1}^n\sum_{k=1}^d a_{jk} a_{jk} = \mathrm{tr}(A A^T), $$ using your definition of the trace (take $D = A, C = A$). But then since you note the trace is cyclic, we also have equality with $\mathrm{tr}(A^TA)$.