Gradient Definition for Non-Cartesian Coordinates – Multivariable Calculus

multivariable-calculus

The gradient of a function $f: \mathbb{R}^n \to \mathbb{R}$ is defined as the vector of the partial derivatives:

$$ \nabla f = \left(\frac{\partial f}{\partial x_1}, …, \frac{\partial f}{\partial x_n}\right)$$

Recently, I have become somewhat confused over this definition since I realized that if, for example, $f$ is defined in spherical coordinates $(r, \theta, \phi)$ , the gradient is given as

$$ \nabla f = \left(\frac{\partial f}{\partial r}, \frac{1}{r} \frac{\partial f}{\partial \theta}, \frac{1}{r \sin \theta} \frac{\partial f}{\partial \phi} \right)$$

rather than

$$ \nabla f = \left(\frac{\partial f}{\partial r}, \frac{\partial f}{\partial \theta}, \frac{\partial f}{\partial \phi} \right)$$

I have two questions regarding this:

  1. Is a scalar-valued function in spherical coordinates still considered to be $f: \mathbb{R}^3 \to \mathbb{R}$, or is $\mathbb{R}^3$ reserved for Cartesian coordinates?
  2. Does the "partial derivative" definition of the gradient in fact require Cartesian coordinates?

Best Answer

It turns out that there are two different but related notions of differentiation for a function $f:\mathbb R^n\to\mathbb R$: the total derivative $df$ and the gradient $\nabla f$.

  • The total derivative is a covector ("dual vector", "linear form") and does not depend on the choice of a metric ("measure of length").
  • The gradient is an ordinary vector and derived from the total derivative, but it depends on a metric. That why it looks a bit funny in different coordinate systems.


The definition of the total derivative answers the following question: given a vector $\vec v$, what is the slope of the function $f$ in the direction of $\vec v$? The answer is, of course

$$ df_{x}(\vec v) = \lim_{t\to0} \frac{f(x+t\vec v)-f(x)}{t}$$

I.e. you start at the point $x$ and walk a teensy bit in the direction of $\vec v$ and take note of the ratio $\Delta f/\Delta t$.

Note that the total derivative is a linear map $\mathbb R^n \to \mathbb R$, not a vector in $\mathbb R^n$. Given a vector, it tells you some number. In coordinates, this is usually written as

$$ df = \frac{\partial f}{\partial x}dx + \frac{\partial f}{\partial y}dy + \frac{\partial f}{\partial z}dz $$

where $dx,dy,dz$ are the total derivatives of the coordinate functions, for instance $dx(v_x,v_y,v_z) := v_x$. This formula looks the same in any coordinate system.


In contrast, the gradient answers the following question: what is the direction of the steepest ascend of the function? Which vector $\vec v$ of unit length maximizes the function $df(\vec v)$? As you can see, this definition crucially depends on the fact that you can measure the length of a vector. The gradient is then defined as

$$ \nabla f = df(\vec v_{max})\cdot\vec v_{max} $$

i.e. it gives both the direction and the magnitude of the steepest change.

This can also be expressed as

$$ \langle \nabla f, \vec v \rangle = df(\vec v) \quad\forall \vec v\in\mathbb R^n.$$

In other words, the scalar product $\langle,\rangle$ is used to convert a covector $df$ into a vector $\nabla f$. This also means that the formula for the gradient looks very different in coordinate systems other than cartesian. If the scalar product is changed (say, to $\langle\vec a,\vec b\rangle := a_xb_x + a_yb_y + 4a_zb_z$), then the direction of steepest ascend also changes. (Exercise: Why?)