Get accurate eigenvectors, when eigenvalues are minuscule

algorithmseigenvaluesnumericspythonr

I have a symmetric matrix A. I'm not able to compute all the eigenvectors accurately, and I believe it is due to the last few eigenvalues for A being really, really minuscule.

Is there a method / library that will allow me to get really accurate eigenvalues/vectors, even if it's incredibly slow? (in R or Python)

(One suggestion was to iteratively eliminate the biggest eigenvalue/vector using "projectors", but I'm not sure how to go about this / if it would work)

EDIT:

Some context: The matrices I'm considering are basically Fisher Information matrices produced by taking the Hessian of the outputs of many-parameter models, each with significant internal parameter "redundancies", which you can uncover from an eigendecomposition. (It's not unusual to get really high condition numbers in these situations, as described here: https://www.lassp.cornell.edu/sethna/Sloppy/WhatAreSloppyModels.html.)

Here's some R code with an example of a matrix I'm interested in.

#A matrix we're interested in:

m <- matrix(c(
0.002307414, -1.318435973, 8.63e-11, -3.33e-11, 0.486238053, 0.41182779, -0.169525454, 0.624275558,
-1.318435973, 753.3425186, -4.93e-08, 1.90e-08, -277.8320732, -235.3147146, 96.86532753, -356.7054684,
8.63e-11, -4.93e-08, 3.23e-18, -1.24e-18, 1.82e-08, 1.54e-08, -6.34e-09, 2.34e-08,
-3.33e-11, 1.90e-08, -1.24e-18, 4.79e-19, -7.01e-09, -5.94e-09, 2.44e-09, -9.00e-09,
0.486238053, -277.8320732, 1.82e-08, -7.01e-09, 102.4642297, 86.78386443, -35.72384951, 131.5526701,
0.41182779, -235.3147146, 1.54e-08, -5.94e-09, 86.78386443, 73.50310589, -30.25693671, 111.4208258,
-0.169525454, 96.86532753, -6.34e-09, 2.44e-09, -35.72384951, -30.25693671, 12.45501408, -45.8654479,
0.624275558, -356.7054684, 2.34e-08, -9.00e-09, 131.5526701, 111.4208258, -45.8654479, 168.8989909
), nrow=8)

print('matrix m:')
print(m)

#Get eigenvalues/vectors
eig <- eigen(m)
print('Eigenvalues/vectors')
print(eig)


#Check - does Matrix*eigenvector == eigenvalue*eigenvector? (expected result is c(1,1,1,1,1,1,1,1)
#
print('Eigenvector 1')
print((m %*% eig$vectors[,1]) / (eig$values[1] * eig$vectors[,1]))   #Fine - yields c(1,1,1,1,1,1,1,1)
print('Eigenvector 2')
print((m %*% eig$vectors[,2]) / (eig$values[2] * eig$vectors[,2]))   #Mostly ok - all approximately 1
print('Eigenvector 3 (bad!)')
print((m %*% eig$vectors[,3]) / (eig$values[3] * eig$vectors[,3]))   #---Bad (elements vary a lot)!
print('Eigenvector 4 (really bad!)')
print((m %*% eig$vectors[,4]) / (eig$values[4] * eig$vectors[,4]))   #---Really bad!
print('Eigenvector 5')
print((m %*% eig$vectors[,5]) / (eig$values[5] * eig$vectors[,5]))   #Mostly ok
print('Eigenvector 6')
print((m %*% eig$vectors[,6]) / (eig$values[6] * eig$vectors[,6]))   #Mostly ok
print('Eigenvector 7')
print((m %*% eig$vectors[,7]) / (eig$values[7] * eig$vectors[,7]))   #Mostly ok
print('Eigenvector 8')
print((m %*% eig$vectors[,8]) / (eig$values[8] * eig$vectors[,8]))   #Mostly ok
```

Best Answer

The problem is due to "leakage" from the large eigenvectors. I will present a brief analysis and then offer a solution, with code, followed by some remarks about the nature and limitations of the solution.


Analysis

Let the eigenvalues of a real symmetric matrix $\mathbb M$ be $\lambda_1,\lambda_2,\ldots,\lambda_d$ ordered by decreasing absolute value so that $|\lambda_1|\ge |\lambda_2|\ge \cdots \ge |\lambda_d|.$ The objective, as stated in a comment to the question, is to find vectors $x$ so that $\mathbb{M}x$ is relatively small. Because the mapping $x\to\mathbb{M}x$ is linear, we may focus our study of such $x$ on those of unit length.

The Spectral Theorem guarantees the existence of an orthonormal basis $\{e_1,e_2,\ldots, e_d\}$ of corresponding eigenvectors in which $x$ may be expressed as the linear combination

$$x = \xi_1 e_1 + \xi_2 e_2 + \cdots + \xi_d e_d.$$

The orthonormality assures us that

$$1 = |x|^2 = \xi_1^2 + \xi_2^2 + \cdots + \xi_d^2.$$

Applying $\mathbb M$ yields

$$\mathbb{M}x = \xi_1 \mathbb{M} e_1 + \xi_2 \mathbb{M}e_2 + \cdots \xi_d \mathbb{M}e_d = \xi_1\lambda_1 e_1 + \xi_2\lambda_2 e_2 + \cdots + \xi_d\lambda_d e_d.$$

Suppose there is some slight error in the computation of the eigenvectors $e_i,$ so that where we think we are working with the eigenvector with eigenvalue $\lambda_i,$ we really are working with a perturbation $e_i^\prime$ where

$$e_i^\prime - e_i = \alpha_{i1}e_1 + \alpha_{i2}e_2 + \cdots + \alpha_{id}e_d.$$

These aren't quite eigenvectors anymore, as we may check by applying $\mathbb M$ to them:

$$\mathbb{M}e_i^\prime = \mathbb{M}e_i + \alpha_{i1}\mathbb{M}e_1 + \cdots + \alpha_{id}\mathbb{M}e_d = \lambda_i e_i + \alpha_{i1}\lambda_1 e_1 + \cdots + \alpha_{id}\lambda_d e_d.$$

Notice, in particular, that when $\lambda_i$ is tiny compared to $\lambda_1,$ the appearance of the multiple $\alpha_{i1}\lambda_1$ of $e_1$ can hugely alter the result, even when $\alpha_{i1}$ is tiny, due to that multiplication by $\lambda_1.$ This is the crux of the matter.

The problem propagates when applying $\mathbb{M}$ to the linear combination

$$x^\prime = \xi_1 e_1^\prime + \xi_2 e_2^\prime + \cdots + \xi_d e_d^\prime,$$

which we think is $x$ (because we think the $e_i^\prime$ are the $e_i$). Applying $\mathbb{M}$ separately to each of the $e_i^\prime$ on the right "leaks" potentially large multiples of $e_1$ into the result.


Solution

Fortunately there's a simple solution: remove the unexpected eigenvectors from the result. When (say) the first $k$ coefficients of $x$ are zero, $\xi_1=\xi_2=\cdots=\xi_k,$ then *there should not be any multiples of $e_1$ through $e_k$ in $\mathbb{M} x.$ We can remove them by projecting the result of $\mathbb{M}x$ onto the space generated by $e_{k+1},\ldots, e_d.$

This doesn't quite get us the correct value of $\mathbb{M}x,$ because of the "leakage" among the smaller eigenvectors. For instance, applying $\mathbb{M}$ to $x=e_d$ yields a linear combination of all the $e_i.$ Projecting out the first $k$ eigenvectors leaves us still with a linear combination

$$\lambda_{k+1}\alpha_{d,k+1} e_{k+1} + \lambda_{k+2}\alpha_{d,k+2} e_{k+2} + \cdots + \lambda_{d}(1+\alpha_{d,d}) e_{d}.$$

However, if (a) $|\lambda_{k+1}| \approx |\lambda_{k+2}| \approx \cdots \approx |\lambda_d|$ are of comparable orders of magnitude and (b) the $\alpha_{d,*}$ coefficients are all relatively small, the result will still be reasonably accurate.

Example

Let's see how this works with the matrix $\mathbb M$ of the question. Here $k=8$ and $\lambda_1\approx 1111$ and $\lambda_2 \approx 5\times 10^{-8}$ are much greater than any of the other eigenvectors. The appearance of tiny negative eigenvalues in a setting where (one assumes) non-negative values are expected is further evidence that the last six (really seven) eigenvalues might just be noisy versions of zero.

The R code below creates random vectors $x^\prime$ in the space generated by the last $8-2=6$ eigenvectors. It applies $\mathbb M$ to them in two ways: directly as $y=\mathbb{M}x^\prime$; and with the "projection method" in which $y$ is regressed against the first two eigenvectors. It returns the norms of those two results. We hope that the norms will be really tiny, because $\mathbb M$ shouldn't change the norm by more than $\max\{|\lambda_3|,\ldots, |\lambda_8|\}\approx 6.25\times 10^{-8}.$ To see what it really does, I plotted histograms of the two sets of results (left and middle). To compare the two approaches I also drew their scatterplot at the right.

Figure

The direct method (left) is awful: a considerable amount of $e_1$ has leaked into the eigenvectors, causing the norms to be inflated by approximately ten(!) orders of magnitude. After projecting the first two eigenvectors out, the norms are always of the expected magnitude. The scatterplot shows no relationship among the results. (In some cases, for other matrices $\mathbb M,$ there are some intriguing relationships induced by the magnitudes of the error coefficients $\alpha_{ij}.$)

Remarks

This solution guarantees that the result of computing $\mathbb{M}x$ will be orthogonal to the largest eigenspaces and will, with an accuracy relative to the magnitudes of the largest eigenvalues, reflect how much $\mathbb M$ affects the lengths. That's probably all that's needed in the intended application.

There's no prospect of doing any better, either. Many of the entries in matrices like $\mathbb M$ were already computed using floating point arithmetic and therefore cannot be considered any more precise than the results of those operations. The large condition numbers of matrices like $\mathbb M$ guarantee that even using infinitely precise arithmetic, the effects of either (a) changing any entry in $\mathbb M$ in its least significant bit or (b) changing any coefficient of $x$ in its least significant bit will hugely perturb the result.

The svd (singular value decomposition) function in R does a better job of computing the eigenvalues and vectors. This doesn't improve the direct calculation of $\mathbb M$ appreciably, though. If you would like to compare these methods, replace the line eig <- eigen(m) by eig <- with(svd(m), list(values = d, vectors = u)).

Code

This picks up after the definition of m in the question and the calculation of its eigenvectors using eigen.

lambda <- eig$values
i <- lambda >= max(lambda) * 1e-12 # Indexes of large eigenvalues.
#
# Generate unit movements in the small directions.
#
nu <- function(x) { # Standardize to unit length if possible
  a <- sum(x^2)
  if(a==0) x else x/sqrt(a)
}
set.seed(17)
sim <- replicate(500, {
  # Naive (uncorrected) algorithm.
  dx <- eig$vectors %*% rnorm(length(lambda)) * ifelse(i, 0, 1)
  dy <- m %*% nu(dx)
  
  # Corrected algorithm with projection.
  dx <- residuals(lm(dx ~ eig$vectors[,i]))
  dz <- m %*% nu(dx)
  c(sqrt(sum(dy^2)), sqrt(sum(dz^2)))
})
#
# Compare.
#
par(mfrow=c(1,3))
hist(sim[1,], main="Magnitudes of resultants: direct method", cex.main=1)
hist(sim[2,], main="Magnitudes of resultants: projection  method", cex.main=1)
plot(t(sim), xlab="Direct method", ylab="Projection method", main="Comparison", cex.main=1)
par(mfrow=c(1,1))

Related Question