[Math] Product rule for Hessian matrix

analysisderivativespartial derivativeproductsreal-analysis

Let $f: \mathbb{R}^n \to \mathbb{R}$ and $g: \mathbb{R}^n \to \mathbb{R}$. Is there a general formula for the Hessian matrix of their product?

That is, what is $H(f(x) g(x))$, where $H(f(x)) = \left(\frac{\partial^2 f}{\partial x_i \partial x_j}\right)_{i,j = 1 \dots n}$?

Best Answer

It follows from theorem 40.1(b) on p. 360 (the product rule) and from corollary 39.7 on p. 351 (the differential in terms of the partial differentials) of Robert G. Bartle's "The Elements of Real Analysis", 2nd edition (John Wiley & Sons, 1976), that if $n \in \{1, 2, \dots\}$, if $f, g : \mathbb{R}^n \rightarrow \mathbb{R}$ and if $c \in \mathbb{R}^n$ is such that both $f$ and $g$ are differentiable at $c$, then $fg$ is likewise differentiable at $c$, all the partial derivatives of $f$, $g$, and $fg$ are well defined at $c$, and the following holds: $$ \nabla_c(fg) = (\nabla_cf)g(c) + f(c)\nabla_cg \tag{*} $$ where $\nabla_cf$ is $f$'s gradient at $c$ (and analogously $\nabla_cg$ and $\nabla_c(fg)$). In other words $\nabla_cf$ (and analogously $\nabla_cg$, $\nabla_c(fg)$) is the $n$-dimensional row vector, whose components are $f$'s partial derivatives at $c$: $$ \nabla_cf = [D_1f(c), \dots, D_nf(c)] $$ (This gradient is defined in exercise 39.J(a) on p. 357 of Bartle's text.)

Therefore, if $f, g$ are differentiable in a neighborhood of $c$ and if $D_1f, D_1g, \dots, D_nf, D_ng$ are all differentiable at $c$, the Hessian matrix of $fg$ is well defined at $c$ (and likewise the Hessian matrices at $c$ of $f$ and of $g$) and its transpose is given by $$\begin{split} H_c(fg)^T & = \left[\begin{array}{c} \nabla_cD_1(fg) \\ \vdots \\ \nabla_cD_n(fg) \end{array}\right] \\ & = \left[\begin{array}{c} \nabla_c\left((D_1f)g + fD_1g\right) \\ \vdots \\ \nabla_c\left((D_nf)g + fD_ng\right) \end{array}\right] \\ & = \left[\begin{array}{c} \nabla_c\left((D_1f)g\right) \\ \vdots \\ \nabla_c\left((D_nf)g\right) \end{array}\right] + \left[\begin{array}{c} \nabla_c\left(fD_1g\right) \\ \vdots \\ \nabla_c\left(fD_ng\right) \end{array}\right] \\ & = \left[\begin{array}{c} \left(\nabla_cD_1f\right)g(c) \\ \vdots \\ \left(\nabla_cD_nf\right)g(c) \end{array}\right] + \left[\begin{array}{c} D_1f(c)\nabla_cg \\ \vdots \\ D_nf(c)\nabla_cg \end{array}\right] + \left[\begin{array}{c} (\nabla_cf)D_1g(c) \\ \vdots \\ (\nabla_cf)D_ng(c) \end{array}\right] + \left[\begin{array}{c} f(c)\nabla_cD_1g \\ \vdots \\ f(c)\nabla_cD_ng \end{array}\right] \\ & = H_cf^Tg(c) + \nabla_cf^T\nabla_cg + \nabla_cg^T\nabla_cf + f(c)H_cg^T \end{split} $$ That is, $H_c(fg)$'s transpose is the vertical concatenation of the $n$ gradients at $c$ of the $n$ partial derivatives $D_1(fg), \dots, D_n(fg)$, respectively. The second and fourth equations are a result of $(*)$ and the third is due to the linearity of differentiation.

Transposing and rearranging the terms, we obtain $$ H_c(fg) = (H_cf)g(c) + \nabla_cf^T\nabla_cg + \nabla_cg^T\nabla_cf + f(c)H_cg $$