How should I interpret constraint $x^T u_1=0$ in a quadratic function optimization

linear algebralinear programmingmachine learningoptimizationquadratic-forms

In a linear algebra book, the following is stated with proof shown, so I get it.

Let A be a symmetric matrix and so $x^TAx$ the quadratic form of a quadratic function. Arrange the eigenvalues such that $\lambda_1 > \lambda_2…>\lambda_n$.
Then Max is the largest eigenvalue, $\lambda_1$, of A and Min is the smallest eigenvalue, $\lambda_n$, of A.
Max is attained when x is a unit eigenvector $u_1$ corresponding to Max.
Min is attained when x is a unit eigenvector $u_n$ corresponding to Min.

Then it states the following without proof and interpretation discussion:

The Max of $x^TAx$ subject to constraints:
$x^Tx=1, x^Tu_1=0$
is the second largest eigenvalue $\lambda_2$ and this Max is attained when x is a unit eigenvector,$u_2$, corresponding to $\lambda_2$.

Question: How should I interpret the constraint $x^Tu_i=0$?
I see that when $A$ in $x^TAx$ is originally diagonal, $x^Tu_i=0$ says $x_i=0$ so $x^Tu_1=0$ says $x_1=0 \rightarrow \lambda_1$ unattainnable and go hunt for the next largest, which is $\lambda_2$.

How about when A is any general symmetric matrix and not neccesarily diagonal?
By having $x^Tu_1=0$, it says $x$ must be orthogonal to $u_1$, so $x\neq u_1 \rightarrow x^TAx \neq\lambda_1$. When $\lambda$ are ranked $\lambda_1 > \lambda_2…$ , constraint $x^Tu_1=0$ seems to suggest 2nd largest value. So if we want the 4th largest, then we add $x^Tu_i=0$ for $i=[1,3]$ which restricts $x$ to be in the subspace orthogonal to those $u_i$? Is that how the constraint should be interpreted?

Best Answer

The essential fact is that the matrix $A$ is diagonal w.r.t. the basis $\{u_1,\ldots,u_n\}$. I'll give a sketch of how to see this, and how to use this to prove the claims in your text.

Fact 1: If $A$ is a symmetric matrix with two distinct eigenvalues $\lambda_1$ and $\lambda_2$, then any pair of corresponding eigenvectors $u_1$ and $u_2$ is perpendicular.

If you are not familiar with this fact, then it is a nice exercise to prove it.

Fact 2: If $A$ is an $n\times n$-matrix with distinct eigenvalues $\lambda_1>\lambda_2>\ldots>\lambda_n$, then there is an orthonormal basis of eigenvectors of $A$.

Proof. For each $i$ let $u_i$ be a unit eigenvector for the eigenvalue $\lambda_i$. By Fact 1 the $u_i$ are pairwise perpendicular, and because we have $n$ distinct eigenvalues, together they form a basis.

With respect to this basis the matrix $A$ is diagonal, with the eigenvalues on the diagonal. Explicitly: $$A=\begin{bmatrix} \lambda_1&0&\cdots&0\\ 0&\lambda_2&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\lambda_n \end{bmatrix}.$$ Fact 3: For a unit vector $x$, the maximum of $x^{\top}Ax$ is $\lambda_1$ and it is attained if $x$ is a unit eigenvector of $\lambda_1$.

As you say this was already shown and you understand it, I won't show this.

The constraint $x^{\top}u_1=0$ restricts $x$ to the subspace perpendicular to $u_1$, i.e. to the subspace spanned by $\{u_2,\ldots,u_n\}$. On this subspace, with respect to this basis, we have $$A'=\begin{bmatrix} \lambda_2&0&\cdots&0\\ 0&\lambda_3&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\lambda_n \end{bmatrix}.$$ Now we can use Fact 3 again; the maximum of $x^{\top}Ax$ is $\lambda_2$ and it is attained if $x$ is a unit eigenvector of $\lambda_2$.

And indeed as you conjecture, repeating this process yields the third largest, fourth largest, fifth largest, etc. eigenvalues. Simply add the constraint $x^{\top}u_2=0$, $x^{\top}u_3=0$, $x^{\top}u_4=0$, etc.

Related Solutions

[Math] Second-smallest eigenvalue as $\min_x \frac{x^TAx}{x^Tx}$

The name "min-max theorem" given in the comments lead to the following very elegant proof.

To repeat in our context: given a symmetric matrix A, we construct an orthonormal basis $\{ v_1,b_2,...,b_n \}$ to $\mathbb{R}^n$ using the eigenvectors of A (which is possible due to the symmetry of A), using the eigenvector $v_1$ from the theorem. As we only look at the space orthogonal to $v_1$ in the minimization, $\{ b_2,...,b_n \}$ is an orthonormal basis.

We note that the dot product can be done coordinate-wise in any orthonormal basis, and hence for all unit vectors $x=(x_2,...,x_n)$ in this orthogonal space: $${x^TAx}=(\sum_{j=2}^n x_j b_j^T)A(\sum_{i=2}^n x_ib_i)=\sum _{i,j}x_i x_j b_j^T A b_i=\sum _{i,j}x_i x_j b_j^T \lambda _i b_i$$ As $Ab_i=\lambda_i b_i$. Now we use $b_j^T b_j=\delta_{ij}$, as they compose an orthonormal basis, to get: $$=\sum _{i}\lambda_i x_i^2 \geq \sum _{i}\lambda_2 x_i^2=\lambda_2$$

Since $x$ is of unit length. we proved that for all $x$ on which we minimize, ${x^TAx} \geq \lambda_2$, and hence $\min{x^TAx} \geq \lambda_2$. The other direction is easy and in written in the question itself. $\square$

The proof assumed indeed that we count the eigenvalues according to their algebraic multiplicity, $\lambda_1 \leq ... \leq \lambda _n$.

Maximization of quadratic form subject to a set constraint

You can directly brute force this problem: Given a function $f\colon\mathbb R^N \to\mathbb R, \theta\mapsto f(\theta)$ with Hessian $H=\frac{\partial^2 f}{\partial^2 \theta}$, then the quadratic form $x^T H x$ can be computed without explicitly computing $H$:

$$ x^T H x = x^T \frac{\partial^2 f}{\partial^2 \theta} x = x^T\frac{\partial}{\partial \theta} \Big(\frac{\partial f}{\partial \theta} x\Big) $$

Once you implement $f$ in a language that supports automatic differentiation such as pytorch or tensorflow, you can compute it efficiently via

  z = grad(f, theta).dot(x)
xHx = grad(z, theta).dot(x)

At no point in this process is the matrix $H$ explicitly constructed. If you have enough RAM you can also concatenate all $x$ vectors and run everything in a single swoop (using appropriate tensorcontraction instead of inner product .dot).

Best Answer

Related Solutions

[Math] Second-smallest eigenvalue as $\min_x \frac{x^TAx}{x^Tx}$

Maximization of quadratic form subject to a set constraint

Related Question