[Math] Second directional derivative and Hessian matrix

eigenvalues-eigenvectorshessian-matrixlinear algebramultivariable-calculus

I am reading the following from the book Deep Learning, and I have the following questions.

enter image description here

  1. I don't quite understand second directional derivatives. The first directional derivative of a function $f:\mathbb{R}^m\to\mathbb{R}$ in the direction $u$ represents the slope of $f$ in the direction $u$. So what does the second directional derivative along the direction $u$ represent?
  2. In the above paragraph, I understood that $d^THd$, the second directional derivative of $f$ in the direction $d$ ($||d||_2=1$), is given by the corresponding eigenvalue when $d$ is an eigenvector of $H$, because if $d$ is an eigenvector of $H$ then $d^THd=d^T\lambda_d d=\lambda_d d^Td=\lambda_d$. However, I don't understand the statement "For other directions of $d$, the directional second derivative is a weighted average of all the eigenvalues, with weights between $0$ and $1$"::–Since $H$ is real symmetric, $H$ has $m$ independent orthogonal eigenvectors, which form a basis for $\mathbb{R}^m$. Thus, if $d$ is not an eigenvector, then $d=c_1x_1+\cdots +c_mx_m$ for some scalars $c_i$s and eigenvectors $x_i$s. Thus, $$d^THd=d^TH(c_1x_1+\cdots +c_mx_m)\\=d^T(c_1\lambda_1x_1+\cdots +c_m\lambda_mx_m)\\=c_1^2||x_1||^2\lambda_1 +\cdots +c_m^2||x_m||^2\lambda_m$$, which is ofcourse the weighted average of all the eigenvalues of $H$. But I don't understand why the weights lie between $0$ and $1$ as given. In fact, there is no reason to believe that the weights $c_i^2||x_i||^2$ to be in the range $(0,1)$.
  3. Also, I don't understand the statement "The maximum eigenvalue determines the maximum second derivative, and the minimum eigenvalue determines the minimum second derivative". Can you explain this?

Best Answer

  1. By direct computation: First directional derivative of $f:\mathbf{R}^m\rightarrow \mathbf{R}$ in the direction of $u$ at $x$ is given by \begin{equation} \partial_u f(x):=\lim_{t\rightarrow 0}\frac{f(x+tu)-f(x)}{t}=\nabla f(x) \cdot u = \sum_{i=1}^{m} u_i\partial_{x_i}f(x). \label{} \end{equation} The second directional derivative along the direction $u$ is given in the similar fasion: \begin{align*} \partial^2_{uu}f(x)&=\partial_u(\partial_u f)\\ &=\lim_{t\rightarrow 0}\frac{\partial_u f(x+tu)-\partial_u f(x)}{t}\\ &=\lim_{t\rightarrow 0}\frac{\nabla f(x+tu)\cdot u-\nabla f(x)\cdot u}{t}\\ &=\lim_{t\rightarrow 0}\frac{u_i \partial_{x_i}f(x+tu)-u_i \partial_{x_i}f(x)}{t}\\ &=u_i \partial_{x_i x_j} f(x)u_j\\ &=u^THu \label{} \end{align*} where $H=D^2 f(x)$ is the Hessian matrix of $f$ at $x$.

    1. $d$ is a direction means $\|d\|=1$, here the norm the usual norm in $\mathbb{R}^n$, i.e., $\|d\|=\sqrt{d_1^2+\cdots+d_n^2}$. therefore, if $d=\sum_{i=1}^{n}\lambda_i e_i$, where $\left\{ e_i \right\}$ is an O.N.B. given by the eigenvectors of $H$, then by pythagorean's theorem, \begin{equation} 1=\left\|d\right\|^2=\sum_{i=1}^{n}\lambda_i^2 \label{} \end{equation} from which we can conclude that $\lambda_i^2$ are between $0$ and $1$.

3.For any direction $d$, from 1 we know that \begin{equation} \partial_{dd}^2 f(x)=d^T H d \label{} \end{equation} Write $d=\sum_{i=1}^{m}c_i e_i$, then we have \begin{align*} d^THd&=\left( \sum_{i=1}^{m}c_i e_i \right)^T H\left( \sum_{i=1}^{m}c_i e_i \right)\\ &=\left( \sum_{i=1}^{m}c_i e_i \right)^{T}\left( \sum_{i=1}^{m} c_i\lambda_i e_i\right)\\ &=\sum_{i=1}^{n}c_i^2 \lambda_i \leq \lambda_{\max}\sum_{i=1}^{m}c_i^2\\ &=\lambda_{\max} \end{align*} where we use the Pythagorean theorem again for $\sum_{i=1}^{m}c_i^2=1$.

On the other hand, if we set $e_1$ be the eigenvector associate to $\lambda_{\max}$, then we have \begin{equation} \partial_{e_1 e_1f(x)}=e_1^T He_1=x_1^T \lambda_{\max} e_1=\lambda_{\max} \label{} \end{equation} In conclusion, \begin{equation} \partial_{dd}f(x)\leq \lambda_{\max}=\partial_{e_1 e_1}f(x) \label{<++>} \end{equation}