The basic fact is that if $W^T$ is an $r\times n$ matrix with orthogonal rows and $S=\operatorname{diag}(s_1,s_2,\ldots,s_n)$ has nonnegative and decreasing diagonal entries, the maximum of $\|W^TS\|_F$ is attained when $W^T=\pmatrix{I_r&0}$. This can be proved by the usual argument. Let $\Lambda=\operatorname{diag}(\lambda_1,\lambda_2,\ldots,\lambda_n)=I_r\oplus0$. Expand $W$ to an orthogonal matrix $Q$. Then $\|W^TS\|_F^2=\operatorname{tr}(\Lambda Q^TS^2Q)=\sum_{i,j}s_i^2\lambda_jq_{ij}^2$, which is a linear function in the entries of matrix $Q\circ Q$ (the entrywise square of $Q$). As $Q\circ Q$ is doubly stochastic, Birkhoff's Theorem dictates that the maximum value of $\sum_{i,j}s_i^2\lambda_jq_{ij}^2$ occurs at some permutation matrix $Q$. Since both $\Lambda$ and $S$ have nonnegative and decreasing diagonal entries, $Q=I_n$ is clearly a global maximiser. Therefore $W^T=\pmatrix{I_r&0}$ is a global maximiser.
Now, in your problem, by absorbing $U_A$ and $V_A^T$ into $U^T$ and $V$ respectively, we may assume that $A=\Sigma=\pmatrix{D&0}$ is already a singular value matrix. Therefore
\begin{align}
\max_{U,V}\operatorname{tr}(U^TAV)
&=\max_{U,V}\operatorname{tr}\left(U^T\pmatrix{D&0}V\right)\\
&=\max_{U,V}\operatorname{tr}\left(U^TD^{1/2}\pmatrix{D^{1/2}&0_{m\times(n-m)}}V\right)\\
&\le\max_{U,V}\left\|U^TD^{1/2}\right\|_F
\left\|\pmatrix{D^{1/2}&0_{m\times(n-m)}}V\right\|_F\tag{1}\\
&=\max_{U,V}\left\|U^TD^{1/2}\right\|_F
\left\|\pmatrix{D^{1/2}&0_{m\times(n-m)}\\ 0_{(n-m)\times m}&0_{(n-m)\times(n-m)}}V\right\|_F.\tag{2}
\end{align}
By our aforementioned fact, the two Frobenius norms in $(2)$ attain maxima when $U^T=\pmatrix{I_r&0_{r\times(m-r)}}$ and $V^T=\pmatrix{I_r&0_{r\times(n-r)}}$. However, if we put $D_r=\operatorname{diag}(\sigma_1,\ldots,\sigma_r)$, we have $D^{1/2}U=\pmatrix{D^{1/2}&0_{m\times(n-m)}}V=\pmatrix{D_r^{1/2}\\ 0_{n\times r}}$. Therefore tie also occurs in the Cauchy-Schwarz inequality $(1)$. Hence this $(U,V)$ is a global maximiser and the maximum possible value of $\operatorname{tr}(U^TAV)$ is $\|D_r^{1/2}\|_F^2=\sum_{i=1}^r\sigma_i(A)$.
Can't think of anything deep, but if both $A$ and $B$ are positive semidefinite, the inequality is true: when $a=\operatorname{tr}(A)$, we have $A\preceq aI$ and hence
$$
\operatorname{tr}(AB)
=\operatorname{tr}(B^{1/2}AB^{1/2})
\le\operatorname{tr}(B^{1/2}(aI)B^{1/2})
=\operatorname{tr}(A)\operatorname{tr}(B).
$$
This also follows from (and hence is weaker than) von Neumann's trace inequality, which in this context says that
$$
\operatorname{tr}(AB)\le\sum_i\lambda_i(A)\lambda_i(B)
$$
when the eigenvalues of $A$ and $B$ are arranged in the same (ascending or descending) order.
Best Answer
Apply Cauchy-Schwarz: $$\sum_i \sigma_{A,i} \sigma_{B,i} \le \sqrt{\sum_i \sigma_{A,i}^2} \sqrt{\sum_i \sigma_{A,i}^2} = \|A\|_F \|B\|_F$$