1) The definition of eigenvector $Ax = \lambda x$ is ambidextrous. If $x$ is an eigenvector, so is $-x$, for then
$$A(-x) = -Ax = -\lambda x = \lambda (-x)$$
So the definition of an eigenbasis is ambiguous of sign.
2) It's hard to know for sure, but I have a strong suspicion of what is happening here. Your equation
$$ (A - \lambda)x = 0 $$
is technically incorrect. The correct equation is
$$ (A - \lambda I)x$$
The first equation is often used as a shorthand for the second. In general, this is unambiguous, because there is no real mathematical way to subtract a vector from a square matrix, but it is abuse of notation. In R
though, you have broadcasting. So if you do
> M <- matrix(c(1, 1, 1, 1), nrow=2)
> M - .5
[,1] [,2]
[1,] 0.5 0.5
[2,] 0.5 0.5
its not really what you want. The proper way would be
> M - diag(.5, 2)
[,1] [,2]
[1,] 0.5 1.0
[2,] 1.0 0.5
The reason you are getting zero solutions is that the matrix you are starting with $A$ is invertible. More than likely (almost surely), the matrix you get by subtracting the same number from every entry will also be invertible. For invertible matrices, the only solution to $Ax = 0$ is the zero vector.
I think there are two ellipses that we could consider. First, consider the image of the unit circle with respect to the map $x \mapsto x^T A x$ for PD $A \in \mathbb R^{n \times n}$. It is a standard result that $f(x) = x^T A x$ is maximized over unit vectors $x$ by the unit eigenvector $v_1$ with largest eigenvalue $\lambda_1$. So this means that the ellipse formed by the image of the unit circle with respect to this map has its largest semi-axis as $v_1$ with length $\lambda_1$, and so on for the other eigenpairs. So in this case we clearly have that the eigenvalues give the lengths of the semi-axes and the biggest semi-axis is for the biggest eigenvalue.
But now consider the contour $f(x) = 1$ for any $x \in \mathbb R^n$. Since $A$ is positive-definite we know that $f$ is a paraboloid, so its intersection with a horizontal plane (i.e. the contour) is an ellipse. In this case, we find that the shortest semi-axis is parallel to $v_1$, which makes sense because that's the direction that $f$ grows the fastest so we hit 1 the soonest. The largest semi-axis is $v_n$ since that's the direction in which $f$ is growing the slowest. Plugging $v_1$ in to $f$ we get $f(v_1) = v_1^T A v_1 = \lambda_1 v_1^T v_1 = \lambda_1$, not 1 as required, so the actual vector parallel to $v_1$ is $\frac{1}{\sqrt \lambda_1}v_1$. Does this help?
Bringing this back to PCA, let's say that our data consists of $m$ observations coming iid from $\mathcal N_2(\vec 0, \Sigma)$. Let's draw an ellipse around our data such that every point in the ellipse has likelihood greater than some cutoff $c$. This corresponds to a contour of the likelihood and can be found by $$
c = \frac{1}{2\pi \vert \Sigma \vert}\exp \left( -\frac12 x^T \Sigma^{-1} x\right) \iff x^T \Sigma^{-1} x = -2\log (2\pi c \vert \Sigma \vert)
$$
i.e. the ellipse that circles the data is a contour of the quadratic form $g(x) = x^T \Sigma^{-1} x$. Note that $\Sigma^{-1}v = \lambda v \implies \frac1\lambda v = \Sigma v$, so the eigenvectors defining the axes of this likelihood contour ellipse are the same as those for the covariance matrix $\Sigma$ but with inverted eigenvalues.
Best Answer
I will answer each question: