[Math] Gradient of holomorphic function equals conjugate of complex derivative

complex numbersderivativesholomorphic-functions

I'm using TensorFlow for some computations with complex variables (and derivatives of these computations). When I compute the derivative of (simple) holomorphic functions, the results obtained with TensorFlow are the conjugate of what I would expect. A simple example:

Given $z = x + yi$, and $f(z) = zz = x^2 + i2xy – y^2$.

We have
$\frac{df}{dz} = \frac12 \left(\frac{\partial f}{\partial x} – i\frac{\partial f}{\partial y}\right) = 2x + i2y$.

Hence, for $z = \frac{1}{5}i$, $\frac{df}{dz} = \frac{2}{5}i$. However, using TensorFlow, I obtain $\frac{df}{dz} = -\frac{2}{5}i$. While searching for this, I found the following statement: "The gradient of a holomorphic function is the conjugate of its complex derivative." in a Github Issue, however, I don't understand why. In other words: Why is the gradient of a holomorphic functions equal to the conjugate of the complex derivative, and, where is the mistake in this simple example?

Best Answer

Couple of years late, but I came across this issue too and did some digging.

The key point is that TensorFlow defines the "gradient" of a complex-valued function $f(z)$ of a complex variable as "the gradient of the real map $F: (x,y)\mapsto Re({f(x+iy)})$, expressed as a complex number" (the gradient of that real map is a vector in $\mathbb R^2$, so we can express it as a complex number in the obvious way).

Presumably the reason for that definition is that in TF one is usually concerned with gradients for the purpose of running gradient descent on a loss function, and in particular for identifying the direction of maximum increase/decrease of that loss function. Using the above definition of gradient means that a complex-valued function of complex variables can be used as a loss function in a standard gradient descent algorithm, and the result will be that the real part of the function gets minimised (which seems to me a somewhat reasonable interpretation of "optimise this complex-valued function").

Note that an equivalent way to write that definition of gradient is $$ gradient(f) := \frac{dF}{dx} + i\frac{dF}{dy}=\overline{\frac{df}{dz} + \frac{d\overline{f}}{dz}},$$ which for a holomorphic function simply reduces to $\overline{\frac{df}{dz}}$.

I wrote up a longer explanation on the GitHub issue you linked: https://github.com/tensorflow/tensorflow/issues/3348#issuecomment-512101921