[Math] The “second derivative test” for $f(x,y)$

determinantmultivariable-calculuspartial derivative

I'm currently taking multivariable calculus, and I'm familiar with the second partial derivative test. That is, the formula $D(a, b) = f_{xx}(a,b)f_{yy}(a, b) – (f_{xy}(a, b))^2$ to determine the behavior of $f(x,y)$ at the point $(a, b, f(a,b))$.

However, my professor simply "spat" this formula at us and provided almost no explanation of its derivation/where it comes from. After researching a bit on my own, I now know that it's the determinant of the Hessian matrix for $f(x,y)$, and I see how the formula is easily derived from that matrix. Wikipedia just says: The following test can be applied at a non-degenerate critical point $x$. If the Hessian is positive definite at $x$, then $f$ attains a local minimum at $x$. If the Hessian is negative definite at $x$, then $f$ attains a local maximum at $x$. If the Hessian has both positive and negative eigenvalues then $x$ is a saddle point for $f$ (this is true even if $x$ is degenerate). Otherwise the test is inconclusive."

I understand that, but I still don't understand why the determinant of this matrix happens to model the behavior of $f$ in this way. Why is it? And if the test happens to fail, what steps should then be taken to determine the nature of $f(x,y)$ at $(a, b, f(a,b))$?

Best Answer

That matrix is symmetric. It is a consequence of linear algebra that a symmetric matrix is orthogonally diagonalizable. That means there are two perpendicular directions upon which that matrix acts as scaling by $\lambda_1$ and by $\lambda_2$.

These $\lambda_i$ represent the quadratic coefficient of a parabolic approximation to the function $f$ at $(x_0,y_0)$ as you move through in the direction of each eigenspace. Since you already are looking at a critical point, the quadratic approximation is reaching its tip at $(x_0,y_0)$. If the two $\lambda_i$ are opposite in sign, you will have two parabolas orthogonal to each other opening in different directions, clearly creating a saddle. If you have two $\lambda_i$ that are of the same sign, then depending on that sign you either have a max or a min.

But the determinant of a $2\times2$ matrix works out to be the same thing as the product of the two eigenvalues. So you can see how a negative determinant implies $\lambda_i$ of opposite sign, which implies a saddle point, and a positive determinant similarly implies either a max or a min.


Locally at any $(x_0,y_0)$, there is a higher dimensional version of the Taylor series, grouped here by increasing order of derivative: $$\begin{align*} f(x,y)&=f(x_0,y_0)+\Big[f_x(x_0,y_0)\cdot(x-x_0)+f_y(x_0,y_0)\cdot(y-y_0)\Big]\\ &\phantom{{}={}}+\frac12\Big[f_{xx}(x_0,y_0)\cdot(x-x_0)^2+f_{xy}(x_0,y_0)\cdot(x-x_0)(y-y_0)\\ &\phantom{{}={}}+f_{yx}(x_0,y_0)\cdot(y-y_0)(x-x_0)+f_{yy}(x_0,y_0)\cdot(y-y_0)^2\Big]+\cdots\\ &=f(x_0,y_0)+\nabla f(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T\\ &\phantom{{}={}}+\frac12\left((x,y)-(x_0,y_0)\right)\cdot H(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T+\cdots \end{align*}$$

When you are at a critical point, this simplifies to $$\begin{align*} f(x,y)&=f(x_0,y_0)+\frac12\left((x,y)-(x_0,y_0)\right)\cdot H(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T+\cdots \end{align*}$$

And if we could change coordinates to an $s$ and $t$ variable that run in the directions of $H$'s eigenspaces, based at the critical point, we'd just have

$$f(s;t)=f(0;0)+\frac12\lambda_1s^2+\frac12\lambda_2t^2+\cdots$$ which I hope helps to see the parabolas and the role of the eigenvalues of $H$.