That matrix is symmetric. It is a consequence of linear algebra that a symmetric matrix is orthogonally diagonalizable. That means there are two perpendicular directions upon which that matrix acts as scaling by $\lambda_1$ and by $\lambda_2$.
These $\lambda_i$ represent the quadratic coefficient of a parabolic approximation to the function $f$ at $(x_0,y_0)$ as you move through in the direction of each eigenspace. Since you already are looking at a critical point, the quadratic approximation is reaching its tip at $(x_0,y_0)$. If the two $\lambda_i$ are opposite in sign, you will have two parabolas orthogonal to each other opening in different directions, clearly creating a saddle. If you have two $\lambda_i$ that are of the same sign, then depending on that sign you either have a max or a min.
But the determinant of a $2\times2$ matrix works out to be the same thing as the product of the two eigenvalues. So you can see how a negative determinant implies $\lambda_i$ of opposite sign, which implies a saddle point, and a positive determinant similarly implies either a max or a min.
Locally at any $(x_0,y_0)$, there is a higher dimensional version of the Taylor series, grouped here by increasing order of derivative:
$$\begin{align*}
f(x,y)&=f(x_0,y_0)+\Big[f_x(x_0,y_0)\cdot(x-x_0)+f_y(x_0,y_0)\cdot(y-y_0)\Big]\\
&\phantom{{}={}}+\frac12\Big[f_{xx}(x_0,y_0)\cdot(x-x_0)^2+f_{xy}(x_0,y_0)\cdot(x-x_0)(y-y_0)\\
&\phantom{{}={}}+f_{yx}(x_0,y_0)\cdot(y-y_0)(x-x_0)+f_{yy}(x_0,y_0)\cdot(y-y_0)^2\Big]+\cdots\\
&=f(x_0,y_0)+\nabla f(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T\\
&\phantom{{}={}}+\frac12\left((x,y)-(x_0,y_0)\right)\cdot H(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T+\cdots
\end{align*}$$
When you are at a critical point, this simplifies to
$$\begin{align*}
f(x,y)&=f(x_0,y_0)+\frac12\left((x,y)-(x_0,y_0)\right)\cdot H(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T+\cdots
\end{align*}$$
And if we could change coordinates to an $s$ and $t$ variable that run in the directions of $H$'s eigenspaces, based at the critical point, we'd just have
$$f(s;t)=f(0;0)+\frac12\lambda_1s^2+\frac12\lambda_2t^2+\cdots$$ which I hope helps to see the parabolas and the role of the eigenvalues of $H$.
One way to understand how the test works is by looking at the Taylor Series of the function $f(x)$ centered around the critical point, $x = c$:
$$
f(x) = f(c) + f'(c)(x-c) + \frac{f''(c)}{2}(x-c)^2 + \cdots
$$
Note: In your question you said that the n-th derivative is non-zero. Here I'm assuming the n+1-st derivative is the first to be non-zero at $x=c$. It doesn't make a difference, it's just the way I learned it.
If $f'(c) = \cdots = f^{(n)}(c) = 0$ and $f^{(n+1)} \ne 0$, then the Taylor Series ends up looking like this:
$$
f(x) = f(c) + \frac{f^{(n+1)}(c)}{(n+1)!}(x-c)^{n+1} + \frac{f^{(n+2)}(c)}{(n+2)!}(x-c)^{n+2} + \cdots
$$
Consider what happens when you move $f(c)$ to the other side of the equation:
$$
f(x) - f(c) = \frac{f^{(n+1)}(c)}{(n+1)!}(x-c)^{n+1} + \frac{f^{(n+2)}(c)}{(n+2)!}(x-c)^{n+2} + \cdots
$$
What does $f(x) - f(c)$ mean?
- If $f(x) - f(c) = 0$, then $f(x)$ has the same value as it does at $x = c$.
- If $f(x) - f(c) < 0$, then $f(x)$ has a value less than it has at $x = c$.
- If $f(x) - f(c) > 0$, then $f(x)$ has a value greater than it has at $x = c$.
We expect $f(x) - f(c) = 0$ at $x = c$ (the equation reflects this), but we're more interested in what it does on either side of $x = c$. When $x$ is really close to $c$, i.e. $(x-c)$ is a really small number, we can say:
$$
f(x) - f(c) \approx \frac{f^{(n+1)}(c)}{(n+1)!}(x-c)^{n+1}
$$
because the higher powers of a small number "don't matter" as much.
Concerning local extrema
If $n$ is odd, then our approximation of $f(x) - f(c)$ is an even-power polynomial. That means $f(x)$ has the same behavior - is either less than or greater than $f(c)$ - on both sides of $x = c$. Therefore it's a local extreme. If $f^{(n+1)}(c) > 0$, then $f(x)$ is greater than $f(c)$ on both sides of $x = c$. Otherwise, if $f^{(n+1)}(c) < 0$, then $f(x)$ is less than $f(c)$ on both sides of $x = c$
If, on the other hand, $n$ is even, then our approximation of $f(x) - f(c)$ is an odd-power polynomial centered around $x = c$. Therefore $f(x)$ will be greater than $f(c)$ on one side of $x = c$, and less on the other. That means $x = c$ isn't a local extreme.
Concerning saddle points
Note that if you differentiate both sides of our approximation twice, you get:
$$
f''(x) \approx \frac{f^{(n+1)}(c)}{(n-1)!}(x-c)^{n-1}
$$
If $n$ is even, this is another odd-power polynomial centered around $x = c$. It therefore has opposite behavior on each side of $x = c$, giving you a saddle point.
Best Answer
This webpage states and proves a version of the higher-order derivative test that applies not only to functions defined on $\mathbb{R}^2$ or $\mathbb{R}^N$, but functions defined on arbitrary Banach spaces. First there is this theorem:
Then there is this corollary for the finite dimensional case, which is what we’re interested in:
Here $f^{(p)}(x)$ denotes a tensor containing all the pure and mixed partial derivatives of $f$ of order $p$, evaluated at $x$.