2nd equality: the $\le$ is just AM-GM inequality. To get $=$, replace $U$ by $\lambda U$ and $V$ by $\lambda^{-1}V$ so that the norms of $U$ and $V$ become the same.
1st equality: the hint tells you how to get $\ge$. So suppose that $X = UV$. Let the singular decomposition of $X = Q \Lambda P^*$, where $Q$ and $P$ are unitary, and $\Lambda$ is diagonal with non-negative entries. Then $\Lambda = Q^* U V P$, and so $$\|X\|_\Sigma = \text{trace}(\Lambda) \le \|Q^* U\|_F \|V P\|_F = \|U\|_F \|V\|_F .$$
In general the Frobenius and nuclear norm are not a good proxy for rank. The norms depend on the singular values $||X||_* = \sum_i \sigma_i(X)$ and $||X||_F^2 = \sum_i \sigma_i^2(X)$... so if the singular values of your matrix are all too small or too large, then you have a bad approximation to the rank.
However, you are right that when the singular values are clustered somewhere near one both norms can give a good approximation of the rank. Lets look at a bound...
If the $r$ nonzero singular values $\sigma_1,...,\sigma_r$ of a rank $r$ matrix $X$ are bounded by $|1-\sigma_i| \leq \epsilon/r$ for a error tolerance $0\leq\epsilon<r$, then $|r - ||X||_*| \leq \epsilon$. This bound tells us that as the your error in the approximation of the rank grows, the singular values can correspondingly grow linearly.
For the Frobenius norm: If the $r$ nonzero singular values $\sigma_1,...,\sigma_r$ of a rank $r$ matrix $X$ are bounded by $\sqrt{1-\epsilon/r}\leq\sigma_i\leq\sqrt{1+\epsilon/r}$, then $|r - ||X||_*| \leq \epsilon$. In this case if we increase our approximation error tolerance, our singular values can only grow according to this square root function.
In other words, these bounds show that regardless of norm choice you don't have a lot of wiggle room in your singular values if you want your norm to well approximate rank. However, for the nuclear norm you have more wiggle room than Frobenius.
A good rule of thumb is that the nuclear norm can achieve the same approximation to rank as the Frobenius rule with about twice as much freedom in singular values(interval twice as large), and that the singular values should always be near one.
In your minimization problem, it is hard to know how to constrain $X$ to ensure well approximation of the rank. Perhaps normalizing the spectral radius of $A$ and $B$ could help, or constraining $X$ with the spectral norm $||Ax||/||x||$.
Best Answer
Are you familiar with proximal algorithms? You are asking how to evaluate the prox operator of the nuclear norm. The answer is given in slide 3-41 in DTU 2010 - Algorithms for Large Scale Convex Optimization - Proximal Gradient Method.