As @ttnphns said, if eigenvalues are similar between them, the covariance matrix of your multivariate vector seems espherical.
A little bit previous to the point of the answer: suppose for a moment that all eigenvalues are different.
Then, $\lambda_1$ is associated with the 1-dimentional affine sub-space in which the projection of the data shows the higher variance (it is usually said that it is the direction that "explains" most of the variance).
Then, $\lambda_2$ is asociated with the direction orthogonal with $\lambda_1$ that "explains" most of the variance between all directions orthogonal to $\lambda_1$ (here, explain is in the same sense of above, that is, projecting the data into this subespace shows higher variance than any other projection, with the restriction of being orthogenal to the one asociated to $\lambda_1$).
And with $\lambda_3, \lambda_4, ... , \lambda_d$ you could keep finding directions orthogonal to the previous ones that explain variance. It comes from this construction that the first direction explains most of the variance of the data, the second one explains less than the first but more than the rest and so on... So, sometimes, the firsts $p < d$ principal components (i.e. data projections on the previous directions) are used to describe the data, as they keep most of their variability while reducing the dimentionality of the dataset.
And here is a big point: all of this holds only if the eigenvalues are different between each other. If $\lambda_1 = \lambda_2$, then you cannot find a single 1-d subespace that explain most of the variance, but you need a 2-d plane to do that (associated with the eigenvectors of $\lambda_1$ and $\lambda_2$). You cannot (essentially) say anything about the directions associated with each of them by themselves, but you can assure that the plane formed by these two is the plane that explains the higher percentage of the variance between all of the available 2-d planes. So data reduction will,at most, be possible until 2-d, no 1-d representation makes sense.
In the extreme case in which all eigenvalues are the same, the data cannot be projected optimally (in the sense of variance explanation understood as above), and no data reduction can be performed. If eigenvalues variance is small, then something like this last case is happening: eigenvalues are very similar to each other, and so succesive directions will explain almost the same percentage of the variance (say near $\frac{1}{d}$ in an extreme, almost equal, case). So PCA would not be useful as a dimention reduction technique, because in order to get a representation of the data that keeps track of a significant amount of the original variability, you will need to use near d dimentions - so no reduction really took place. For example, if $d=8$ and eigenvalues are almost the same (i.e., their variance is near zero) then if you use 3 PC you will only explain around $\frac{3}{8} = 37.5\%$ of your original data variance: no PC low dimentionality representation keeps track of your data variability, so it may hide a lot of its beahviour, and should not be used.
I think there are two ellipses that we could consider. First, consider the image of the unit circle with respect to the map $x \mapsto x^T A x$ for PD $A \in \mathbb R^{n \times n}$. It is a standard result that $f(x) = x^T A x$ is maximized over unit vectors $x$ by the unit eigenvector $v_1$ with largest eigenvalue $\lambda_1$. So this means that the ellipse formed by the image of the unit circle with respect to this map has its largest semi-axis as $v_1$ with length $\lambda_1$, and so on for the other eigenpairs. So in this case we clearly have that the eigenvalues give the lengths of the semi-axes and the biggest semi-axis is for the biggest eigenvalue.
But now consider the contour $f(x) = 1$ for any $x \in \mathbb R^n$. Since $A$ is positive-definite we know that $f$ is a paraboloid, so its intersection with a horizontal plane (i.e. the contour) is an ellipse. In this case, we find that the shortest semi-axis is parallel to $v_1$, which makes sense because that's the direction that $f$ grows the fastest so we hit 1 the soonest. The largest semi-axis is $v_n$ since that's the direction in which $f$ is growing the slowest. Plugging $v_1$ in to $f$ we get $f(v_1) = v_1^T A v_1 = \lambda_1 v_1^T v_1 = \lambda_1$, not 1 as required, so the actual vector parallel to $v_1$ is $\frac{1}{\sqrt \lambda_1}v_1$. Does this help?
Bringing this back to PCA, let's say that our data consists of $m$ observations coming iid from $\mathcal N_2(\vec 0, \Sigma)$. Let's draw an ellipse around our data such that every point in the ellipse has likelihood greater than some cutoff $c$. This corresponds to a contour of the likelihood and can be found by $$
c = \frac{1}{2\pi \vert \Sigma \vert}\exp \left( -\frac12 x^T \Sigma^{-1} x\right) \iff x^T \Sigma^{-1} x = -2\log (2\pi c \vert \Sigma \vert)
$$
i.e. the ellipse that circles the data is a contour of the quadratic form $g(x) = x^T \Sigma^{-1} x$. Note that $\Sigma^{-1}v = \lambda v \implies \frac1\lambda v = \Sigma v$, so the eigenvectors defining the axes of this likelihood contour ellipse are the same as those for the covariance matrix $\Sigma$ but with inverted eigenvalues.
Best Answer
You are right, the ellipse given by $\mathbf x^T\Sigma\mathbf x = 1$ has the largest axis along the smallest eigenvalue. So it is kind of the "opposite" of the curves from contour plots of e.g. the normal density. But note, that the normal density uses the inverse of $\Sigma$: $$ p(\mathbf x) = \frac{1}{\sqrt{det(2\pi\Sigma)}}\exp(-\frac{1}{2}\mathbf x^T\Sigma^{-1}\mathbf x), $$ so the points of equal density are those given by $\mathbf x^T\Sigma^{-1}\mathbf x$, i.e. the direction of the largest eigenvalue will be the largest axis of the ellipse.