According to Wikipedia, given a differentiable mapping $F: \mathbb{R}^n \to \mathbb{R}^m$, its Jacobian matrix is a $m \times n$ matrix defined as:
$$
J_F=\begin{bmatrix} \dfrac{\partial y_1}{\partial x_1} & \cdots & \dfrac{\partial y_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial y_m}{\partial x_1} & \cdots & \dfrac{\partial y_m}{\partial x_n} \end{bmatrix}.
$$
Specially when $m=1$, the Jacobian matrix is also called the gradient $\nabla F$.
So when trying to compute a differential, it is $J_F \Delta x$ or $\nabla F \Delta x$.
In real analysis, optimization, …, some texts agree with Wikipedia's definitions. However, in some others, a Jacobian matrix or a gradient of a differentiable mapping is defined to be the transpose of the Wikipedia definitions.
Moreover, in baby Rudin, $J_F$ is of $m \times n$ dimension, while when $m=1$, $\nabla F$ is of $n \times 1$ dimension.
When it comes to writing my own formulas, I wonder which way is mostly adopted?
Thanks and regards!
Best Answer
$\newcommand{\R}{\mathbf{R}}$Rudin's conventions are certainly the ones I use. He isn't contradicting himself, but it takes some work to see that. You've asked about bilinear forms and duality before, I think, so I hope the following makes sense.
If $F\colon \R^n \to \R$ is a smooth function and $x \in \R^n$, then $J_F$ is something which takes in a vector in the tangent space $T_x\R^n$, which is canonically identified with $\R^n$, to something in $T_{F(x)}\R = \R$. With Rudin's conventions, this should correspond to a $1 \times n$ matrix. We can view $J_F$ as an element of the cotangent space $T_x^\vee\R^n$ at $x$.
$\R^n$ comes with an inner product $\langle\phantom{x}, \phantom{x}\rangle$, the "dot product". What this allows us to do is identify $T_x\R^n$ with $T_x^\vee\R^n$. See the page on nondegenerate forms for more details. The upshot is that there has to be a tangent vector $\nabla F \in T_x\R^n$ such that $\langle \nabla F, v\rangle = J_F \cdot v$ for all $v \in T_x\R^n$. Since $\nabla F$ is a tangent vector, we express it as an $n \times 1$ matrix. You can check that the entries work out to be what Rudin says they are.
This causes more confusion later on in life, because a manifold may not come with a metric and then these identifications that one uses so freely in calculus cannot be made. Manifolds that do have an analogue of the dot product are called Riemannian.