Multivariable Calculus – Gradient and Jacobian Row and Column Conventions

multivariable-calculus

Say $f$ is a scalar valued function from $\mathbb{R}^n \to \mathbb{R}$. When I learnt about the gradient $\nabla f(\mathbf{x})$ I always thought of it as a column vector in the same space as $\mathbf{x}$. That way, the dot product $\nabla f \cdot \mathbf{v}$ gives the directional derivative in direction $\mathbf{v}$.

All the definitions I can find of the Jacobian of $\mathbf{y} = \psi(\mathbf{x})$ however define it as:

\begin{bmatrix}
\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\
\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\
\vdots&\vdots \\
\frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\
\end{bmatrix}

But this would make $\nabla f$ a row vector, which then means the directional derivative is no longer $\nabla f \cdot \mathbf{v}$.

Which way is correct? What are the consequences if I accidently write the Jacobian the opposite way? I have found some similar questions here but none that answer my question directly. I'm still learning this stuff so please explain in simple terms 🙂

Best Answer

In general, the derivative of a function $f : \mathbb{R}^n \to \mathbb{R}^m$ at a point $p \in \mathbb{R}^n$, if it exists, is the unique linear transformation $Df(p) \in L(\mathbb{R}^n,\mathbb{R}^m)$ such that $$ \lim_{h \to 0} \frac{\|f(p+h)-f(p)-Df(p)h\|}{\|h\|} = 0; $$ the matrix of $Df(p)$ with respect to the standard orthonormal bases of $\mathbb{R}^n$ and $\mathbb{R}^m$, called the Jacobian matrix of $f$ at $p$, therefore lies in $M_{m \times n}(\mathbb{R})$.

Now, suppose that $m=1$, so that $f : \mathbb{R}^n \to \mathbb{R}$. Then if $f$ is differentiable at $p$, $Df(p) \in L(\mathbb{R}^n,\mathbb{R}) = (\mathbb{R}^n)^\ast$ is a functional, and hence the Jacobian matrix, as you point out, lies in $M_{1 \times n}(\mathbb{R})$, i.e., is a row vector. However, by the Riesz representation theorem, $\mathbb{R}^n \cong (\mathbb{R}^n)^\ast$ via the map that sends a vector $x \in \mathbb{R}^n$ to the functional $y \mapsto \left\langle y,x \right\rangle$. Hence, if $f$ is differentiable at $p$, then the gradient of $f$ at $p$ is the unique (column!) vector $\nabla f(p) \in \mathbb{R}^n$ such that $$ \forall h \in \mathbb{R}^n, \quad Df(p)h = \left\langle \nabla f(p),h\right\rangle; $$ in particular, if you unpack definitions, you'll find that the Jacobian matrix of $f$ at $p$ is precisely $\nabla f(p)^T$.