When classifying critical points of $f(x,y)$, what does a “degenerate point” (when $f_{xx}f_{yy}=f_{xy}^2$) mean, exactly

calculusderivativesmultivariable-calculusterminology

In my textbook, I have the following way of classifying critical points:

\begin{array} {|c|c|c|c|}\hline f_x=f_y=0 & f_{xx} < 0 & W > 0 & \text{local maximum} \\ f_x=f_y=0 & f_{xx} > 0 & W > 0 & \text{local minimum} \\ f_x=f_y=0 & f_{xx} \text{ anything} & W < 0 & \text{saddle point} \\ f_x=f_y=0 & f_{xx} \text{ anything} & W = 0 & \text{degenerate} \\ \hline \end{array}

Notation: $f_x$ and $f_y$ are partial derivatives with respect to $x$ and $y$. And $W=f_{xx}f_{yy}-f^2_{xy}$ is the determinant of the Jacobian of second derivatives.

I understand what minimum/maximum points are and what saddle points are.

But what does a degenerate point mean, exactly?

Best Answer

I'm a bit surprised to see the word "degenerate" here - but what it means in context is that if $f_x=0$ and $f_y=0$ and $\det J_f = 0$, then the method fails to establish anything - you could be at a maximum or minimum, or you might not be. Basically, it's a catch-all case to say "you don't know what happened".

The method here works because we could write $$f(x+\Delta x,y + \Delta y) - f(x,y)\approx \frac{1}2f_{xx}(x,y)\cdot(\Delta x)^2 + f_{xy}(x,y)\cdot(\Delta x)(\Delta y) + \frac{1}2f_{yy}(x,y)\cdot(\Delta y)^2$$ where the approximation is good up to smaller than quadratic terms near $(x,y)$. If the determinant of $J_f$ is positive, we can say that we know what the approximation on the right looks like: it's some sort of "cup" shape forming a local minimum/maximum - perhaps it's something like $(\Delta x)^2 + (\Delta y)^2$. Similarly, a negative determinant indicates a saddle shape like $(\Delta x)^2 - (\Delta y)^2$. These shapes are robust to change in that adding a less-than-quadratic term does not affect the overall shape.

However, if the determinant of $J_f$ vanishes, that means that the approximation on the right will have some direction in which the approximation does not change and where the approximation's derivatives also vanish - it might look like $(\Delta x)^2$ where the function is curved upwards along the $x$ axis, but flat along the $y$-axis. The higher order terms would affect the behavior along the $y$ axis - maybe you're at a local minimum like if you had the function $x^2+y^4$ near $(0,0)$. Maybe it's a sort of saddle point, like in $x^2 - y^4$. Maybe it's something weird like $x^2 + y^3$ which is not a minimum or maximum, nor is it really a saddle point. Even worse, the approximation could just be $0$, where we learned pretty much nothing about the function from its first and second order terms. This happens for functions like $x^3+y^3$ or $x^4+y^4$ where you can see two different behaviors.

Basically: whenever the determinant of $J_f$ is non-zero, the behavior of the function locally is well described by a quadratic approximation that classifies it between minimum, maximum, and saddle point. When this determinant is zero, we need to look at better approximations. It's the same issue as arises in one dimension where if we know $f'(x)=0$ we can determine minimum or maximum if $f''(x)\neq 0$, but would need more information if $f''(x)=0$.