The answer to both 1 and 2 is no, but care is needed in interpreting the existence theorem.
Variance of Ridge Estimator
Let $\hat{\beta^*}$ be the ridge estimate under penalty $k$, and let $\beta$ be the true parameter for the model $Y = X \beta + \epsilon$. Let $\lambda_1, \dotsc, \lambda_p$ be the eigenvalues of $X^T X$.
From Hoerl & Kennard equations 4.2-4.5, the risk, (in terms of the expected $L^2$ norm of the error) is
$$
\begin{align*}
E \left( \left[ \hat{\beta^*} - \beta \right]^T \left[ \hat{\beta^*} - \beta \right] \right)& = \sigma^2 \sum_{j=1}^p \lambda_j/ \left( \lambda_j +k \right)^2 + k^2 \beta^T \left( X^T X + k \mathbf{I}_p \right)^{-2} \beta \\
& = \gamma_1 (k) + \gamma_2(k) \\
& = R(k)
\end{align*}
$$
where as far as I can tell, $\left( X^T X + k \mathbf{I}_p \right)^{-2} = \left( X^T X + k \mathbf{I}_p \right)^{-1} \left( X^T X + k \mathbf{I}_p \right)^{-1}.$ They remark that $\gamma_1$ has the interpretation of the variance of the inner product of $\hat{\beta^*} - \beta$, while $\gamma_2$ is the inner product of the bias.
Supposing $X^T X = \mathbf{I}_p$, then
$$R(k) = \frac{p \sigma^2 + k^2 \beta^T \beta}{(1+k)^2}.$$
Let
$$R^\prime (k) = 2\frac{k(1+k)\beta^T \beta - (p\sigma^2 + k^2 \beta^T \beta)}{(1+k)^3}$$ be the derivative of the risk w/r/t $k$.
Since $\lim_{k \rightarrow 0^+} R^\prime (k) = -2p \sigma^2 < 0$, we conclude that there is some $k^*>0$ such that $R(k^*)<R(0)$.
The authors remark that orthogonality is the best that you can hope for in terms of the risk at $k=0$, and that as the condition number of $X^T X$ increases, $\lim_{k \rightarrow 0^+} R^\prime (k)$ approaches $- \infty$.
Comment
There appears to be a paradox here, in that if $p=1$ and $X$ is constant, then we are just estimating the mean of a sequence of Normal$(\beta, \sigma^2)$ variables, and we know the the vanilla unbiased estimate is admissible in this case. This is resolved by noticing that the above reasoning merely provides that a minimizing value of $k$ exists for fixed $\beta^T \beta$. But for any $k$, we can make the risk explode by making $\beta^T \beta$ large, so this argument alone does not show admissibility for the ridge estimate.
Why is ridge regression usually recommended only in the case of correlated predictors?
H&K's risk derivation shows that if we think that $\beta ^T \beta$ is small, and if the design $X^T X$ is nearly-singular, then we can achieve large reductions in the risk of the estimate. I think ridge regression isn't used ubiquitously because the OLS estimate is a safe default, and that the invariance and unbiasedness properties are attractive. When it fails, it fails honestly--your covariance matrix explodes. There is also perhaps a philosophical/inferential point, that if your design is nearly singular, and you have observational data, then the interpretation of $\beta$ as giving changes in $E Y$ for unit changes in $X$ is suspect--the large covariance matrix is a symptom of that.
But if your goal is solely prediction, the inferential concerns no longer hold, and you have a strong argument for using some sort of shrinkage estimator.
Best Answer
I do not know if you are still interested in this issue. I think it will be useful for your problem to look at the limiting result of the estimator mean squared error (for a penalty parameter approaching infinity).
We can indicate with $\hat{\beta}_{r} = (X^\top X + \lambda I )^{-1} X^\top y$ the ridge estimator and with $\hat{\beta} = (X^\top X)^{-1} X^\top y$ the OLS estimator (which is unbiased, hence $E(\hat{\beta}) = \beta$). Now, if we define $K = (X^\top X + \lambda I )^{-1} X^\top X$ we can verify that $\hat{\beta}_{r} = K \hat{\beta}$ (so $K$ transforms the OLS estimator in the ridge one).
Then, keeping in mind the definition of $K$, it can be demonstrated that (see e.g. Hoerl and Kennard, 1970):
$$ \begin{array}{lll} MSE(\hat{\beta}_{r}) &= E[(\hat{\beta}_{r} - \beta)^\top (\hat{\beta}_{r} - \beta)] = \mbox{Var}(\hat{\beta}_{r}) + [\mbox{Bias}(\hat{\beta}_{r})]^2 \\ & = \sigma^{2}\mbox{tr}\{K (X^{\top} X)^{-1}K^{\top}\} + \beta^{\top}(K - I)^{\top}(K - I)\beta \\ \mbox{Var}(\hat{\beta}_{r}) &= \sigma^{2}\mbox{tr}\{K (X^{\top} X)^{-1}K^{\top}\} \\ [\mbox{Bias}(\hat{\beta}_{r})]^2 &= \beta^{\top}(K - I)^{\top}(K - I)\beta. \end{array} $$
From above we can compute $$ \lim_{\lambda \rightarrow\infty} MSE(\hat{\beta}_{r}) = \beta^\top \beta\\ $$
which is the squared bias of an estimator equal to zero (since the variance, as you pointed out, goes to zero for limiting $\lambda$). I hope this helps a bit (also I hope the notation is correct and clear enough).