Vector Analysis – Product Rule for Scalar-Vector Product

derivativesvector analysis

Let $\mathbf F : \mathbb R^p \to \mathbb R^s$ and $\phi : \mathbb R^p \to \mathbb R$ be differentiable functions. Let the function $\mathbf G$ be defined as follows:
$$\mathbf G : \mathbb R^p \to \mathbb R^s \qquad \mathbf G(\mathbf y) = \phi(\mathbf y)\mathbf F(\mathbf y)$$

Furthermore, let $y_0$ be a point in $\mathbb R^p$. Then the Jacobian of $\mathbf G$ and of $\mathbf F$ at $y_0$, denoted respectively $D\mathbf G(y_0)$ and $D\mathbf F(y_0)$ are $s \times p$ matrices, whereas the Jacobian of $\phi$ at $y_0$, denoted $D\phi(y_0)$, is a row vector $p$ entries long and may thus be turned into a gradient:
$$\nabla \phi(y_0) \doteq D\phi(y_0)^\top$$

Now the question is, how can I express $D\mathbf G(y_0)$ in terms of the other two Jacobians? I tried recklessly applying the product rule,
$$D\mathbf g(y_0) \stackrel{?}{=} \phi(y_0) D\mathbf F(y_0) + D\phi(y_0) \mathbf F(y_0) $$
but the dimensions of the matrices do not match up correctly. What am I doing wrong?

Best Answer

There are two ways of looking at this that sort it out: we can use indices, or we can understand everything as linear maps and work it out explicitly.

In index notation, we have functions $G_i = \phi(y) F_i(y)$. All of these are scalars, so the usual product rule for scalar functions applies: $$ \frac{\partial G_i}{\partial y_j} = \phi \frac{\partial F_i}{\partial y_j} + \frac{\partial \phi}{\partial y_j} F_i. $$ Since $\phi$ is a scalar, to write down the matrix corresponding to this, we can reverse the order to put the $j$ terms on the right, i.e. $$ (DG)_{ij} = \phi (DF)_{ij} + F_i (\nabla\phi)_j = (\phi DF + F \otimes \nabla \phi)_{ij}, $$ $\otimes$ being the dyadic product $(A \otimes B)_{ij} = A_i B_j$.

Approaching the problem in a coordinate-free way, the derivative $DG(y)$ is a linear map $\mathbb{R}^p \to \mathbb{R}^s$ given uniquely by $$ G(y+h) = G(y) + DG(y)(h) + o(\lVert h \rVert). $$ Written this way, it doesn't matter how $h$ is incorporated providing that the expression ends up in $\mathbb{R}^s$. One can show that the product rule (or a clever use of the chain rule) in this formalism gives you $$ DG(y)(h) = [\phi(y)] DF(y)(h) + [D\phi(y)(h)] F(y), $$ where the terms in brackets are both scalars (and hence we can push them about to end up with all the $h$s on the right if we wish).

The important thing to understand is that since the derivative is a linear map, it has to have an argument fed into it somewhere. If it divides up into terms that are some form of products of derivatives and parts of the original function, the argument must be fed into the derivatives, not the other parts of the function (this is a good reason to think about functions $\mathbb{R}^n \supset U \to \mathbb{R}^m$, since then the argument of the function is restricted to a subset, but the tangent vectors that you feed into the derivative are not.

Best Answer

Related Solutions

[Math] Minimization with complex gradient descent

[Math] Tricks to remember Vector Calculus formulas

Related Question