The name "min-max theorem" given in the comments lead to the following very elegant proof.
To repeat in our context: given a symmetric matrix A, we construct an orthonormal basis $\{ v_1,b_2,...,b_n \}$ to $\mathbb{R}^n$ using the eigenvectors of A (which is possible due to the symmetry of A), using the eigenvector $v_1$ from the theorem.
As we only look at the space orthogonal to $v_1$ in the minimization, $\{ b_2,...,b_n \}$ is an orthonormal basis.
We note that the dot product can be done coordinate-wise in any orthonormal basis, and hence for all unit vectors $x=(x_2,...,x_n)$ in this orthogonal space: $${x^TAx}=(\sum_{j=2}^n x_j b_j^T)A(\sum_{i=2}^n x_ib_i)=\sum _{i,j}x_i x_j b_j^T A b_i=\sum _{i,j}x_i x_j b_j^T \lambda _i b_i$$
As $Ab_i=\lambda_i b_i$. Now we use $b_j^T b_j=\delta_{ij}$, as they compose an orthonormal basis, to get:
$$=\sum _{i}\lambda_i x_i^2 \geq \sum _{i}\lambda_2 x_i^2=\lambda_2$$
Since $x$ is of unit length. we proved that for all $x$ on which we minimize, ${x^TAx} \geq \lambda_2$, and hence $\min{x^TAx} \geq \lambda_2$. The other direction is easy and in written in the question itself. $\square$
The proof assumed indeed that we count the eigenvalues according to their algebraic multiplicity, $\lambda_1 \leq ... \leq \lambda _n$.
As I think you are asking for intuition regarding "The second derivative in a specific direction represented by a unit vector d is given by $d’Hd$”, let me correlate it in two ways with the normal way we think about derivatives. I’ll use two dimensions to illustrate in both cases. Let the unit vector $\bar{d}$ be $(n_1,n_2)$ in the standard basis and let $\bar{x}$ represent the point (x,y).
For the shorter explanation, consider the function value $f(\bar{x}+ds \bar{d})$ at a small distance $ds$ from $\bar{x}$ along $\bar{d}$ as a Taylor expansion. Let $h=n_1ds$ and $k=n_2ds$ denote the corresponding increments along the x and y directions.
$$f(\bar{x}+ds \bar{d})=f(x,y) + hf_x+kf_y + \frac{1}{2}(h^2f_{xx}+ 2hkf_{xy}+ k^2f_{yy}) + \mbox{h.o.t.}$$
$$=f(x,y) + ds(n_1f_x+n_2f_y) + \frac{1}{2}ds^2(n_1^2f_{xx}+ 2n_1n_2f_{xy}+ n_2^2f_{yy}) + \mbox{h.o.t.}$$
$$=f(x,y) + ds (\nabla f \cdot \bar{d} )+ \frac{1}{2}ds^2 (\bar{d}’H \bar{d} )+ \mbox{h.o.t.}$$
That is, $\nabla f \cdot \bar{d}$ plays the role of the first derivative and $\bar{d}’H \bar{d}$ plays the role of the second derivative along the direction $\bar{d}$.
The second explanation is using the same idea but depending on your bent of mind, might be more intuitive. Proceeding as in finite differences, where $f_x$ is approximated by $\frac{f(x+\Delta x)-f(x)}{\Delta x}$ with the approximation becoming exact as $\Delta x \rightarrow 0$. Then the second derivative $f_{xx}$ is likewise approximated by $$\frac{ f_x(x+\frac{\Delta x}{2}) - f_x(x -\frac{\Delta x}{2}) }{\Delta x}$$
$$~ \frac{ f( x + \Delta x) -2f(x) + f( x - \Delta x) }{\Delta x^2}$$
Now, apply that one dimensional second derivative idea along the direction $\bar{d}$ to see that, ignoring higher order terms for now, the second derivative is
$$ \frac{ f( x + h, y+ k) -2f(x) + f( x - h, y-k }{ h^2 + k^2}$$
Using 2 dimensional Taylor expansions for $f( x + h, y+ k)$ and $ f( x - h, y-k )$ (write it out)
and using $h=n_1ds$ and $k=n_2ds$, we see that the second derivative approximation is given by
$$ds^2 \frac{ n_1^2f_{xx}+ 2n_1n_2f_{xy}+ n_2^2f_{yy} }{ ds^2} =
ds^2 \frac{ \bar{d}’H \bar{d} }{ ds^2} = \bar{d}’H \bar{d} $$ with the higher order terms vanishing as you take $ds$ to zero.
I would have liked to expand some of the steps more, but MathJax on a phone is rather painful. I hope one of these explanations felt intuitive to you. Please leave a comment if more clarification is needed.
Best Answer
By direct computation: First directional derivative of $f:\mathbf{R}^m\rightarrow \mathbf{R}$ in the direction of $u$ at $x$ is given by \begin{equation} \partial_u f(x):=\lim_{t\rightarrow 0}\frac{f(x+tu)-f(x)}{t}=\nabla f(x) \cdot u = \sum_{i=1}^{m} u_i\partial_{x_i}f(x). \label{} \end{equation} The second directional derivative along the direction $u$ is given in the similar fasion: \begin{align*} \partial^2_{uu}f(x)&=\partial_u(\partial_u f)\\ &=\lim_{t\rightarrow 0}\frac{\partial_u f(x+tu)-\partial_u f(x)}{t}\\ &=\lim_{t\rightarrow 0}\frac{\nabla f(x+tu)\cdot u-\nabla f(x)\cdot u}{t}\\ &=\lim_{t\rightarrow 0}\frac{u_i \partial_{x_i}f(x+tu)-u_i \partial_{x_i}f(x)}{t}\\ &=u_i \partial_{x_i x_j} f(x)u_j\\ &=u^THu \label{} \end{align*} where $H=D^2 f(x)$ is the Hessian matrix of $f$ at $x$.
3.For any direction $d$, from 1 we know that \begin{equation} \partial_{dd}^2 f(x)=d^T H d \label{} \end{equation} Write $d=\sum_{i=1}^{m}c_i e_i$, then we have \begin{align*} d^THd&=\left( \sum_{i=1}^{m}c_i e_i \right)^T H\left( \sum_{i=1}^{m}c_i e_i \right)\\ &=\left( \sum_{i=1}^{m}c_i e_i \right)^{T}\left( \sum_{i=1}^{m} c_i\lambda_i e_i\right)\\ &=\sum_{i=1}^{n}c_i^2 \lambda_i \leq \lambda_{\max}\sum_{i=1}^{m}c_i^2\\ &=\lambda_{\max} \end{align*} where we use the Pythagorean theorem again for $\sum_{i=1}^{m}c_i^2=1$.
On the other hand, if we set $e_1$ be the eigenvector associate to $\lambda_{\max}$, then we have \begin{equation} \partial_{e_1 e_1f(x)}=e_1^T He_1=x_1^T \lambda_{\max} e_1=\lambda_{\max} \label{} \end{equation} In conclusion, \begin{equation} \partial_{dd}f(x)\leq \lambda_{\max}=\partial_{e_1 e_1}f(x) \label{<++>} \end{equation}