The gradient of a function $f: \mathbb{R}^n \to \mathbb{R}$ is defined as the vector of the partial derivatives:
$$ \nabla f = \left(\frac{\partial f}{\partial x_1}, …, \frac{\partial f}{\partial x_n}\right)$$
Recently, I have become somewhat confused over this definition since I realized that if, for example, $f$ is defined in spherical coordinates $(r, \theta, \phi)$ , the gradient is given as
$$ \nabla f = \left(\frac{\partial f}{\partial r}, \frac{1}{r} \frac{\partial f}{\partial \theta}, \frac{1}{r \sin \theta} \frac{\partial f}{\partial \phi} \right)$$
rather than
$$ \nabla f = \left(\frac{\partial f}{\partial r}, \frac{\partial f}{\partial \theta}, \frac{\partial f}{\partial \phi} \right)$$
I have two questions regarding this:
- Is a scalar-valued function in spherical coordinates still considered to be $f: \mathbb{R}^3 \to \mathbb{R}$, or is $\mathbb{R}^3$ reserved for Cartesian coordinates?
- Does the "partial derivative" definition of the gradient in fact require Cartesian coordinates?
Best Answer
It turns out that there are two different but related notions of differentiation for a function $f:\mathbb R^n\to\mathbb R$: the total derivative $df$ and the gradient $\nabla f$.
The definition of the total derivative answers the following question: given a vector $\vec v$, what is the slope of the function $f$ in the direction of $\vec v$? The answer is, of course
$$ df_{x}(\vec v) = \lim_{t\to0} \frac{f(x+t\vec v)-f(x)}{t}$$
I.e. you start at the point $x$ and walk a teensy bit in the direction of $\vec v$ and take note of the ratio $\Delta f/\Delta t$.
Note that the total derivative is a linear map $\mathbb R^n \to \mathbb R$, not a vector in $\mathbb R^n$. Given a vector, it tells you some number. In coordinates, this is usually written as
$$ df = \frac{\partial f}{\partial x}dx + \frac{\partial f}{\partial y}dy + \frac{\partial f}{\partial z}dz $$
where $dx,dy,dz$ are the total derivatives of the coordinate functions, for instance $dx(v_x,v_y,v_z) := v_x$. This formula looks the same in any coordinate system.
In contrast, the gradient answers the following question: what is the direction of the steepest ascend of the function? Which vector $\vec v$ of unit length maximizes the function $df(\vec v)$? As you can see, this definition crucially depends on the fact that you can measure the length of a vector. The gradient is then defined as
$$ \nabla f = df(\vec v_{max})\cdot\vec v_{max} $$
i.e. it gives both the direction and the magnitude of the steepest change.
This can also be expressed as
$$ \langle \nabla f, \vec v \rangle = df(\vec v) \quad\forall \vec v\in\mathbb R^n.$$
In other words, the scalar product $\langle,\rangle$ is used to convert a covector $df$ into a vector $\nabla f$. This also means that the formula for the gradient looks very different in coordinate systems other than cartesian. If the scalar product is changed (say, to $\langle\vec a,\vec b\rangle := a_xb_x + a_yb_y + 4a_zb_z$), then the direction of steepest ascend also changes. (Exercise: Why?)