Proof Verification Convergence of Eigenvalues/Eigenvectors

convergence-divergenceeigenvalues-eigenvectorslinear algebrareal-analysissolution-verification

I have to prove the following proposition and am unsure of my argument about the convergence of eigenvalues. Any feedback is greatly appreciated!

Proposition. Let $\Sigma$ be a $k\times k$ real symmetric matrix with eigendecomposition $Q\Lambda Q'$, where diagonal elements of $\Lambda$ are ranked in decreasing order. Let $Q_n$ be a sequence of $k\times k$ matrices and $\Lambda_n$ be a sequence of $k\times k$ diagonal matrices with elements ranked in decreasing order. Suppose that

$(i)$ $\Sigma Q_n-Q_n\Lambda_n \to 0$ as $n \to \infty$

$(ii)$ $Q_n^{'}Q_n \to I_k$ as $n \to \infty$

where convergence is with respect to the Frobenius norm. Then $\Lambda_n \to \Lambda$ as $n\to \infty$. Moreover, if the eigenvalues of $\Sigma$ are distinct, then given a choice of sign for the columns of $Q$, there exist a sequence of diagonal matrices $D_n$ with elements $\pm 1$ such that $Q_n D_n \to Q$ as $n\to \infty$.

Proof. I will use the following facts from real analysis.

Fact 1. Let $(x_n)$ be a bounded sequence in $R^k$. Then $(x_n)$ has at least one cluster point $x\in R^k$. By cluster point I mean that each neighborhood of $x$ contains $x_n$ for infinitely many $n$. Equivalently, there exist a subsequence $(x_m)$ of $(x_n)$ such that $(x_m)$ converges to $x$.

Fact 2. Let $(x_n)$ be a bounded sequence in $R^k$ and suppose that $(x_n)$ has exactly one cluster point $x$. Then $(x_n)$ converges to $x$.

Fact 3. Let $(x_n)$ be a bounded sequence in $R^k$ and suppose that $(x_n)$ has exactly two cluster points $x_1$ and $x_2$. Then we can partition $(x_n)$ into two disjoint subsequences $(x_m)$ and $(x_p)$ such that $(x_m)$ converges to $x_1$ and $(x_p)$ converges to $x_2$.

Fact 4. Let $(x_n^1,\dots,x_n^l)$ be a bounded sequence in $R^k$, where $(x_n^j)$ is a sequence in $R^{n_j}$ for each $j=1,\dots,l$ and $n_1+\dots +n_l=k$. Fix some $i$ and let $x^i$ be a cluster point of $(x_n^i)$. Then for each $j\neq i$ there exist a cluster point $x^j$ of $(x_n^j)$ such that $(x^1,\dots,x^k)$ is a cluster point of $(x_n^1,\dots,x_n^l)$.

Let $Q_n^j$ denote the $j$th column of $ Q_n$ and $\lambda_n^j$ denote the $j$th diagonal element of $\Lambda_n$. From $(ii)$ we have $\left\lVert Q_n^j \right \rVert\to 1$ and so the sequence $Q_n^j$ is bounded for each $j$. From $(i)$ we have $\left\lVert \Sigma Q_n^j-\lambda_n^jQ_n^j \right\rVert \to 0$ and so the sequence $\left\lVert\Sigma Q_n^j-\lambda_n^jQ_n^j \right\rVert$ is bounded for each $j$. Also, $\left\lVert\Sigma Q_n^j \right\rVert \leq \left\lVert \Sigma \right\rVert \left\lVert Q_n^j \right\rVert $ and $\left\lVert Q_n^j \right\rVert$ bounded implies that the sequence $\left\lVert\Sigma Q_n^j \right\rVert$ is bounded for each $j$. Next, using the triangle and reverse triangle inequalities we have

\begin{equation}
\left\lVert \lambda_n^j Q_n^j \right \rVert\leq \biggl \lvert \left\lVert \Sigma Q_n^j \right\rVert – \left\lVert \lambda_n^j Q_n^j \right\rVert \biggr \rvert + \left\lVert \Sigma Q_n^j \right\rVert \leq \left\lVert \Sigma Q_n^j – \lambda_n^j Q_n^j \right\rVert +\left\lVert\Sigma Q_n^j \right\rVert
\end{equation}
and so we get that the sequence $\left\lVert \lambda_n^j Q_n^j \right\rVert$ is bounded for each $j$. Since $\left\lVert Q_n^j \right\rVert \to 1$, we have that $\lvert \lambda_n^j\rvert=\left\lVert \lambda_n^j Q_n^j \right\rVert/ \left\lVert Q_n^j \right\rVert$ is bounded for all sufficiently large $n$, and so we conclude that the sequence $\lambda_n^j$ is bounded for each $j$.

Consider now the joint sequence $(\lambda_n^j,Q_n^j)$ in $R^{k+1}$. Our previous argument shows that this sequence is bounded for each $j$, and so has at least one cluster point $(\lambda^j,Q^j)\in R^{k+1}$ (fact 1). We will show that such a cluster point must be an eigenvalue-eigenvector pair with $\left\lVert Q_j \right\rVert=1$. To this end, note that the functions

\begin{align*} & \hspace{0.1cm} R\times R^{k} \rightarrow R^{k} \\
& \hspace{0.1cm} (c,x)\mapsto \Sigma x-cx \end{align*}

\begin{align*} & \hspace{0.1cm} R^{k} \rightarrow R \\
& \hspace{0.1cm} x\mapsto \left\lVert x \right\rVert\end{align*}
are both continuous. Therefore, let $(\lambda_m^j,Q_m^j)$ be any subsequence of $(\lambda_n^j,Q_n^j)$ converging to $(\lambda^j,Q^j)$ to get
\begin{equation}
\left\lVert Q^j \right\rVert=\lim \left\lVert Q_{m}^j \right\rVert = \lim \left\lVert Q_n^j \right\rVert =1
\end{equation}

\begin{equation}
\Sigma Q^j-Q^j\lambda^j=\lim \Sigma Q_m^j-Q_m^j\lambda_m^j = \lim \Sigma Q_n^j-Q_n^j\lambda_n^j = 0
\end{equation}
and so $\lambda^j$ is an eigenvalue of $\Sigma$ with corresponding unit length eigenvector $Q^j$.

Lets now show convergence of the elements of $\Lambda_n$, starting with $\lambda_n^1$. Let $\lambda^1$ be any cluster point of the sequence $\lambda_n^1$. Pick a cluster point $Q^1$ of the sequence $Q_n^1$ such that $(\lambda^1,Q^1)$ is a cluster point of the joint sequence $(\lambda_n^j,Q_n^j)$ (fact 4). Next, for each $j>1$, pick a cluster point $(\lambda^j,Q^j)$ of the sequence $(\lambda_n^j,Q_n^j)$ such that $(\lambda^1,Q^1,\lambda^2,Q^2,\dots,\lambda^k,Q^k)$ is a cluster point of the joint sequence $(\lambda_n^1,Q_n^1,\lambda_n^2,Q_n^2,\dots,\lambda_n^k,Q_n^k)$ (fact 4). Let $(\lambda_m^1,Q_m^1,\dots,\lambda_m^k,Q_m^k)$ be a subsequence of $(\lambda_n^1,Q_n^1,\dots,\lambda_n^k,Q_n^k)$ converging to $(\lambda^1,Q^1,\lambda^2,Q^2,\dots,\lambda^k,Q^k)$.

From our previous argument we know that $\lambda^1,\dots,\lambda^k$ are eigenvalues of $\Sigma$ with corresponding unit length eigenvectors $Q^{1},\dots,Q^{k}$. We now show that $\lambda^1,\dots,\lambda^k$ are exactly the $k$ eigenvalues of $\Sigma$. Suppose there is an eigenvalue $\lambda$ of $\Sigma$ which is repeated more times than its multiplicity among $\lambda^1,\dots,\lambda^k$, say $l$ times. Thus there exist indexes $j_1,\dots,j_l$ such that $\lambda=\lambda^{j_1}=\dots=\lambda^{j_l}$ and $Q^{j_1},\dots,Q^{j_l}$ are $l$ unit length eigenvectors associated to $\lambda$. But, by $(ii)$ and the continuity of the inner product we have that for any two indexes $j_s\neq j_t$

\begin{equation}
\langle Q^{j_s},Q^{j_t}\rangle = \lim \langle Q_m^{j_s},Q_m^{j_t}\rangle = \lim \langle Q_n^{j_s},Q_n^{j_t}\rangle = 0
\end{equation}
and so the eigenvectors $Q^{j_1},\dots,Q^{j_l}$ are linearly independent. This implies that the dimension of the eigenspace of $\lambda$ is at least $l$, a contradiction. We conlude that $\lambda^1,\dots,\lambda^k$ are exactly the $k$ eigenvalues of $\Sigma$.

We can now show that $\lambda^1$ must equal the largest eigenvalue of $\Sigma$. Otherwise there would exist an eigenvalue $\lambda$ of $\Sigma$ such that $\lambda_1<\lambda$. But since diagonal elements of $\Lambda_n$ are ranked in decreasing order we get, for any $j$,

\begin{equation}
\lambda^j=\lim \lambda_m^j \leq \lim \lambda_m^1 = \lambda^1 < \lambda
\end{equation}
implying that all eigenvalues of $\Sigma$ are strictly less than $\lambda$, a contradiction.

We have shown that the only cluster point of $\lambda_n^1$ is the largest eigenvalue of $\Sigma$. Thus $\lambda_n^1$ converges to the largest eigenvalue of $\Sigma$ (fact 2).

The same argument can be repeated successively for $j=2,\dots, k$ to show that that the only cluster point of $\lambda_n^j$ is the $j$th largest eigenvalue of $\Sigma$, implying that $\lambda_n^j$ converges to the $j$th largest eigenvalue of $\Sigma$. We have proven that $\Lambda_n \to \Lambda$.

Now suppose the eigenvalues of $\Sigma$ are distinct. Then the columns of $Q$ are unique up to a sign. Choose a sign for each column and denote the resulting $j$th column $Q_j^{*}$. Let $Q^j$ be a cluster point of the sequence $Q_n^j$. Pick a cluster point $\lambda^j$ of the sequence $\lambda_n^j$ such that $(Q^j,\lambda^j)$ is a cluster point of the joint sequence $(Q_n^j,\lambda_n^j)$ (fact 4). Then $\lambda^j$ is the $j$th largest eigenvalue of $\Sigma$ with associated unit length eigenvector $Q^j$. Because eigenvectors are unique up to a sign, we have $Q^j=Q_j^{*}$ or $Q^j=-Q_j^{*}$. Thus the sequence $Q_n^j$ has at most two cluster points $\pm Q_j^{*}$. We now define the $j$th column of $D_n$, denoted $d_n^j$, as follows. If $Q_n^j$ has only one cluster point, let $d_n^j=1$ for all $n$ or $d_n^j=-1$ for all $n$ depending on whether this cluster point is $Q_j^{*}$ or $-Q_j^{*}$ respectively. If $Q_n^j$ has two cluster points, then we can partition $Q_n^j$ into two disjoint subsequences $Q_m^j$ and $Q_p^j$ such that $Q_m^j$ converges to $Q_j^{*}$ and $Q_p^j$ converges to $-Q_j^{*}$ (fact 3). We then define $d_n^j=1$ if $n$ is in the index of the sequence $Q_m^j$ and $d_n^j=-1$ if $n$ is in the index of the sequence $Q_p^j$. We now have that the sequence $Q_n^j d_n^j$ converges to $Q_j^{*}$ for each $j$, and so $Q_n D_n \to Q$.

Edit. There is an alternative proof for the convergence of $Q_n$ (assuming convergence of $\Lambda_n$ and distinct eigenvalues), which is perhaps more elegant. It goes as follows.

Since $\Lambda_n-\Lambda\to 0$ and $Q_n$ is bounded, we have $Q_n (\Lambda_n-\Lambda)\to 0$. Therefore $(i)$ implies

$$\Sigma Q_n-Q_n\Lambda = \Sigma Q_n-Q_n\Lambda_n + Q_n (\Lambda_n-\Lambda) \to 0$$

Now, for each $n$, let $R_n=Q^{-1}Q_n$. Then $R_n$ is a sequence of $k\times k$ matrices. Using the eigendecomposition of $\Sigma$, continuity of matrix multiplication and the previous result we get

$$ \Lambda R_n -R_n \Lambda = \Lambda Q^{-1}Q_n – Q^{-1}Q_n\Lambda = Q^{-1} \Sigma Q_n – Q^{-1}Q_n\Lambda = Q^{-1} ( \Sigma Q_n – Q_n\Lambda ) \to 0 $$

Let $R_n^{ij}$ with $i\neq j$ be any off diagonal element of the matrix $R_n$ in position $(i,j)$. Then the previous calculation shows that $$(\lambda_i-\lambda_j)R_n^{ij} \to 0$$ and, because eigenvalues are assumed to be distinct, we conclude $R_n^{ij} \to 0$. Also, by $(ii)$ we have that

$$R_n^{'}R_n=Q_n^{'}(Q^{-1})^{'}Q^{-1}Q_n=Q_n^{'}Q_n \to I_k$$ so we deduce that, for each $i$,

$$(R^{ii}_n)^2 \to 1$$ Now define the $i$th diagonal element of $D_n$, denoted $d_n^{i}$, as follows

$$d_n^{i}=1_{\{R^{ii}_n \geq 0\}}-1_{\{R^{ii}_n < 0\}}$$ Then $d_n^{i}=\pm 1$ for each $i$ and continuity of the square root function implies $$d_n^{i}R^{ii}_n=\lvert R^{ii}_n \rvert = \sqrt{(R^{ii}_n)^2} \to 1$$ for each $i$. Therefore $R_n D_n \to I_k$. Finally, continuity of matrix multiplication allows us to conclude

$$Q_n D_n = Q R_n D_n \to Q$$

Best Answer

Your proof that $\Lambda_n \to \Lambda$ (as far as I can tell) is correct, and I can't think of a way to make it more efficient.

Once you have gotten the that point, however, we can make the following argument. Note that $$ \lim_{n \to \infty}\Sigma Q_n - Q_n \Lambda = \\ \lim_{n \to \infty} \Sigma Q_n - Q_n \Lambda_n - Q_n(\Lambda - \Lambda_n) = \\ \lim_{n \to \infty} (\Sigma Q_n - Q_n \Lambda_n) - \lim_{n \to \infty} Q_n(\Lambda - \Lambda_n) =\\ 0-0 = 0. $$ With that, it now suffices to apply the argument from your previous post (which applies to general symmetric matrices).


An observation that might lead to another approach: multiplying both sides of (i) by $Q_n'$ yields $$ Q_n'\Sigma Q_n - Q_n'Q_n \Lambda_n \to 0. $$ With (ii), we have $\lim_{n \to \infty}(Q_n'Q_n - I_k) \Lambda_n = 0$, so that $$ Q_n' \Sigma Q_n - \Lambda_n \to 0. $$


Another observation: if we define $R_n = Q'Q_n$ from the beginning, then we have $$ \Sigma Q_n - Q_n \Lambda_n \to 0 \iff\\ \Sigma QR_n - QR_n \Lambda_n \to 0 \iff \\ (Q'\Sigma Q) R_n - Q'QR_n \Lambda_n \to 0 \iff\\ \Lambda R_n - R_n \Lambda_n \to 0. $$ In other words, it suffices to consider the case in which $\Sigma$ is diagonal. A calculation similar to yours shows that this is equivalent to saying that $$ \lim_{n \to \infty} (\lambda^i - \lambda_n^j)R_n^{ij} = 0. $$