Derivative of symmetric matrix with respect to its elements

derivativesmatricesmatrix-calculus

Let's say we have a symmetric matrix $g$ whose elements are $g_{ij}$. I would like to write down the answer for the derivative of $g_{ij}$ with respect to $g_{kl}$. Since $g$ is symmetric I would expect that
$$\frac{\partial g_{ij}}{\partial g_{ij}}=\frac{\partial g_{ji}}{\partial g_{ij}}=1$$
And all other derivatives vanish. I would like to be able to write this as a result in terms of Kronecker deltas for general indices but I am getting stuck. My first guess would be to say
$$\frac{\partial g_{ij}}{\partial g_{kl}}=\delta^{k}_{i}\delta^{l}_j$$

But this does not respect the symmetry of the matrix elements, so one might guess to symmetrize:
$$\frac{\partial g_{ij}}{\partial g_{kl}}=\frac{1}{2}\Big(\delta^{k}_{i}\delta^{l}_j+\delta^{l}_{i}\delta^{k}_j\Big)$$

But now we have a problem because the nonzero off-diagonal derivatives equal $1/2$ and the diagonal derivatives equal $1$.

Is there anyway to resolve this?

Edit: Going off of the comments so far, I tried a different form which looks wrong, but it's the only form I can think of which respects the way in which these indices transform under an e.g. orthogonal transformation:
$$\frac{\partial g_{ij}}{\partial g_{kl}}=\frac{1}{2}\Big(\delta^{k}_{i}\delta^{l}_j+\delta^{l}_{i}\delta^{k}_j-\frac{3}{n}g_{ij}g^{kl}\Big)$$
Where $g^{kl}$ are the elements of $g^{-1}$, assuming it exists, and $n$ is the dimension of the matrix. This is obtained by assuming that the right hand side is a projection matrix.

Best Answer

This is actually a generic issue with partial derivatives with respect to a constrained object, and parameterizations of that object that would allow violating the constraints. This also crops up with e.g. unit vectors, or orthogonal matrices.

A symmetric matrix only has n*(n+1)/2 degrees of freedom -- you cannot vary them separately, and it's not clear what it would mean to hold $g_{i,j}$ fixed while varying $g_{j,i}$ -- the usual explanation for partial derivatives. If you have a function defined along a curve in 3-space, there really isn't a partial derivative with respect to x, especially if some places along the curve x doesn't locally change.

What people generally want in cases like this is a way to use the normal formulas and have them work, at least when the "infinitesimal changes" keep the new object obeying the constraint. For example, the change in a symmetric matrix is a symmetric matrix, and the change in a unit vector must be orthogonal to the unit vector. These treatments usually involve some hideous and non-obvious abuses of notation, and are "really" moving to a parameterized, but constrained total derivative. Doing this for a symmetric matrix has $\mathrm{d}G = S\,\mathrm{d}t$, with $S$ any symmetric matrix.

The derivative of a unit vector with respect to itself is an example of this, though going through method 2 below. The end result of the "partial" derivative with respect to itself is a projection onto the portion of a change that maintains the constraint.

This generalizes. For a symmetric matrix it's particularly simple, as unlike the unit vector case, the projection is always the same, not depending on the state being varied. We now want to fully describe this for the symmetric matrix case. We'll take the output matrix to be $H$ to clarify notation slightly. Obviously, the output changes in $h_{i,j}$ will have to be some linear combination of the input changes to both $g_{i,j}$ and $g_{j,i}$. In particular notice the case of an actual symmetric change. There both have changed the same amount, and that's the amount we want. For a symmetric change, $\mathrm{d} h_{i,j} = \alpha\mathrm{d}g_{i,j} + (1 - \alpha)\mathrm{d}g_{j,i}$ works for any value of alpha. The only symmetric way to get this is a contribution of $1/2$ from each.

I.e. we want $$\frac {\partial h_{i,j}}{\partial g_{k,l}} = \frac{1}{2} (\delta_{i,k}\delta_{j,l} + \delta_{i,l} \delta_{j,k} )$$

But now we have a problem because the nonzero off-diagonal derivatives equal 1/2

Why is this a problem? For an actual symmetric change it recovers the right output, because both inputs have changed the same amount.

Alternate ways truly keeping things in the realm of partial derivatives include:

  1. If the symmetric matrix is just being considered on it's own, it's perfectly possible to take $g_{i,j}$ with $i \le j$ as "basis" elements. You no longer have the structure of a matrix for the derivative, but that may be okay -- if you're just trying to solve for a stationary point, it works fine. This will end up with less symmetric expressions though.

  2. Sometimes you can treat a constrained system as a transform of an unconstrained system. Here, in the specific case of a symmetric matrix, we can treat derivatives of the symmetrized form of a matrix: $H = \operatorname{Sym}(G) = (G + G^T) / 2$. Then $\partial h_{i,j} /\partial g_{k,l} = (\delta_{i,k}\delta_{j,l} + \delta_{i,l} \delta_{j,k} ) / 2$, with nothing confusing at all, recovering what we did above. The point of view is slightly different, as above we worked directly with the changes, while here we just powered through the algebra of symmetrization.