[Math] Gradient of largest eigenvalue of matrix, with respect to individual elements of the matrix

eigenvalues-eigenvectorsmatrix-calculusmultivariable-calculusspectral-radiusvector analysis

If $M$ is a $n\times n$ matrix, let $f(M)$ denote the largest eigenvalue (in absolute value) of $M$. In other words, if $\lambda_1,\dots,\lambda_n$ are the eigenvalues of $M$, define

$$f(M) = \max(|\lambda_1|,\dots,|\lambda_n|).$$

This can be viewed as a function $f:\mathbb{R}^{n^2} \to \mathbb{R}$ on a $n^2$-dimensional input.

Now given a matrix $M$, I'd like to compute the gradient $\nabla f(M)$. How do I do that?

Equivalently, for each $i,j$, I want to compute the derivative ${\partial \over \partial M_{i,j}} f(M)$ of $f(M)$ with respect to the $i,j$-th entry of the matrix. I can't figure out a clean way to compute this, as computing the eigenvalues involves Gaussian elimination, and it's not clear how to differentiate through that process.

(This is based on application where I want to do gradient descent on a function with a term of the form $f(M)$, so I need to be able to compute the gradient to do that.)

Best Answer

It appears the solution is to find the largest eigenvalue, say $\lambda_1$, and find the corresponding eigenvector, call it $v_1$. Then, the derivative ${\partial f \over \partial M_{i,j}}(M)$ is given by

$${\partial f \over \partial M_{i,j}}(M) = {\partial \lambda_1 \over \partial M_{i,j}} = (v_1 \cdot b_i) (v_1 \cdot b_j) = (v_1)_i \cdot (v_1)_j,$$

where I have used a formula for the derivative of a eigenvalue taken from Derivatives of eigenvalues. Here $b_i$ represents the basis vector with a 1 in the $i$th position and a 0 elsewhere.

Thus, once we have computed the largest eigenvalue and its corresponding eigenvector, we can easily fill in the entries of the gradient $\nabla f(M)$ as above.

Related Question