How should I interpret constraint $x^T u_1=0$ in a quadratic function optimization

linear algebralinear programmingmachine learningoptimizationquadratic-forms

In a linear algebra book, the following is stated with proof shown, so I get it.

Let A be a symmetric matrix and so $x^TAx$ the quadratic form of a quadratic function. Arrange the eigenvalues such that $\lambda_1 > \lambda_2…>\lambda_n$.
Then Max is the largest eigenvalue, $\lambda_1$, of A and Min is the smallest eigenvalue, $\lambda_n$, of A.
Max is attained when x is a unit eigenvector $u_1$ corresponding to Max.
Min is attained when x is a unit eigenvector $u_n$ corresponding to Min.

Then it states the following without proof and interpretation discussion:

The Max of $x^TAx$ subject to constraints:
$x^Tx=1, x^Tu_1=0$
is the second largest eigenvalue $\lambda_2$ and this Max is attained when x is a unit eigenvector,$u_2$, corresponding to $\lambda_2$.

Question: How should I interpret the constraint $x^Tu_i=0$?
I see that when $A$ in $x^TAx$ is originally diagonal, $x^Tu_i=0$ says $x_i=0$ so $x^Tu_1=0$ says $x_1=0 \rightarrow \lambda_1$ unattainnable and go hunt for the next largest, which is $\lambda_2$.

How about when A is any general symmetric matrix and not neccesarily diagonal?
By having $x^Tu_1=0$, it says $x$ must be orthogonal to $u_1$, so $x\neq u_1 \rightarrow x^TAx \neq\lambda_1$. When $\lambda$ are ranked $\lambda_1 > \lambda_2…$ , constraint $x^Tu_1=0$ seems to suggest 2nd largest value. So if we want the 4th largest, then we add $x^Tu_i=0$ for $i=[1,3]$ which restricts $x$ to be in the subspace orthogonal to those $u_i$? Is that how the constraint should be interpreted?

Best Answer

The essential fact is that the matrix $A$ is diagonal w.r.t. the basis $\{u_1,\ldots,u_n\}$. I'll give a sketch of how to see this, and how to use this to prove the claims in your text.

Fact 1: If $A$ is a symmetric matrix with two distinct eigenvalues $\lambda_1$ and $\lambda_2$, then any pair of corresponding eigenvectors $u_1$ and $u_2$ is perpendicular.

If you are not familiar with this fact, then it is a nice exercise to prove it.

Fact 2: If $A$ is an $n\times n$-matrix with distinct eigenvalues $\lambda_1>\lambda_2>\ldots>\lambda_n$, then there is an orthonormal basis of eigenvectors of $A$.

Proof. For each $i$ let $u_i$ be a unit eigenvector for the eigenvalue $\lambda_i$. By Fact 1 the $u_i$ are pairwise perpendicular, and because we have $n$ distinct eigenvalues, together they form a basis.

With respect to this basis the matrix $A$ is diagonal, with the eigenvalues on the diagonal. Explicitly: $$A=\begin{bmatrix} \lambda_1&0&\cdots&0\\ 0&\lambda_2&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\lambda_n \end{bmatrix}.$$ Fact 3: For a unit vector $x$, the maximum of $x^{\top}Ax$ is $\lambda_1$ and it is attained if $x$ is a unit eigenvector of $\lambda_1$.

As you say this was already shown and you understand it, I won't show this.

The constraint $x^{\top}u_1=0$ restricts $x$ to the subspace perpendicular to $u_1$, i.e. to the subspace spanned by $\{u_2,\ldots,u_n\}$. On this subspace, with respect to this basis, we have $$A'=\begin{bmatrix} \lambda_2&0&\cdots&0\\ 0&\lambda_3&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\lambda_n \end{bmatrix}.$$ Now we can use Fact 3 again; the maximum of $x^{\top}Ax$ is $\lambda_2$ and it is attained if $x$ is a unit eigenvector of $\lambda_2$.

And indeed as you conjecture, repeating this process yields the third largest, fourth largest, fifth largest, etc. eigenvalues. Simply add the constraint $x^{\top}u_2=0$, $x^{\top}u_3=0$, $x^{\top}u_4=0$, etc.

Related Question