[Math] Paradox on the derivative of the rank of a matrix

derivativesmatricesmatrix-calculusmatrix-rankpseudoinverse

It is clear that the function

$$f : \mathbb R^{m \times n} \to \mathbb N, \qquad X \mapsto \mbox{rank}(X)$$

has no derivative at all $X$ because the image of $f(X)$ assume values in the natural set. On the other hand, we know that the rank of any matrix can be computed by

$$\mbox{rank}(X) = \mbox{tr} \left( X^+ X \right) \tag{1}$$

where $\text{tr}$ is the trace operator and $X^+$ is the Moore-Penrose pseudoinverse of $X$. Notice, however, that the RHS of $(1)$ is differentiable everywhere (the trace operator and the pseudo inverse have derivatives) and it can be computed as

$$f'(X) = (X^T\otimes I_n)\left(-(X^+)^T \otimes X^+ \right) + (I_n \otimes X^+).$$

I am confusing if the rank has or not derivative at all points $X$. Probably I have made some mistake and I am missing something. I need some help. Thanks in advance!

Similar behavior happens with the nuclear norm (see Derivative of nuclear norm).

Best Answer

Your assumption that $f$ has no derivative anywhere is wrong.

In fact, the derivative exists and is $0$ almost everywhere.

Every matrix with full rank has a neighborhood of matrices that also have full rank, so in this neighborhood $f$ is constant and thus differentiable.

For matrices $X$ that do not have full rank, $f$ is not differentiable at $X$. (The pseudoinverse is not differentiable at such points either; it isn't even continuous).

Related Question