We can compute the variance-covariance matrice of b and $\tilde \beta$ and hence compare their variances in order to tell which one has the smallest variance. (Best estimator)
Knowing tha $E(U)=0$
$E(b-\beta)(b-\beta)'= E(\beta+(X'X)^{-1}X'u-\beta)(\beta+(X'X)^{-1}X'u-\beta)'$
$=E[(X'X)^{-1}X'u][(X'X)^{-1}X'u]$
$=E[(X'X)^{-1}X'uu'X(X'X)^{-1})$ recall that $E(uu')= \sigma^2I$
$=\sigma^2(X'X)^{-1}X'IX(X'X)^{-1}$
$=\sigma^2(X'X)^{-1}$
So it has been shown that b is unbiased as $E(b_i)=\beta_i$ and therefore $E(b)=\beta$
but also linear as $b=(X'X)^{-1}X'y$, $b=Ay$ with $A=(X'X)^{-1}X'$
Let $\tilde \beta$ a general linear unbiased estimator, $\tilde \beta\equiv(A+C)y=[(X'X)^{-1}X'+C)y$
$=[(X'X)^{-1}X'+C)(X\beta+u)$
$=\beta+(X'X)^{-1}X'u+CX\beta+Cu$, we require $CX=0$ $\forall$ $X$ for unbiasedness
$E(\tilde \beta)=E[\beta+(X'X)^{-1}Xu + CX\beta+Cu]$ with $CX=0$ required and $E(u)=0$
$=E(\beta)$
Hence $\tilde \beta$ is also unbiased and $\tilde \beta=b+Cy$ is linear
Variance-covariance matrix of $\tilde \beta$: $E(\tilde \beta-\beta)(\tilde \beta-\beta)'=E[(X'X)^{-1}X'u+Cu)[(X'X)^{-1}Xu+Cu)']$
$=E[(X'X)^{-1}X'uu'X(X'X)^{-1}+(X'X)^{-1}X'uu'C']+Cuu'X(X'X)^{-1}+Cuu'C'$
using the fact that $E(uu')=\sigma^2I$ which simplifies to
$E(\tilde \beta-\beta)(\tilde \beta-\beta)'= \sigma^2(X'X)^{-1}+\sigma^2(X'X)^{-1}X'C'+\sigma^2CX(X'X)^{-1}+\sigma^2CC'$
we know that $CX=0$ hence $(CX)'=X'C'=0$ so we have
$E(\tilde \beta-\beta)(\tilde \beta-\beta)'=\sigma^2(X'X)^{-1}+\sigma^2CC'$
$=\sigma^2[(X'X)^{-1}+CC']$ , Recall that $E(b-\beta)(b -\beta)'=\sigma^2(X'X)^{-1}$
Thus $E(\tilde \beta-\beta)(\tilde \beta-\beta)'=E(b-\beta)(b -\beta)'+ \sigma^2CC'$
We can conclude that $Var(b)\leq Var(\tilde \beta)$ which means that b is the best estimator.
Regarding the bias in the correlation: When sample sizes are small enough for bias to have any practical significance (e.g., the n < 30 you suggested), then bias is likely to be the least of your worries, because inaccuracy is terrible.
Regarding the bias of R2 in multiple regression, there are many different adjustments that pertain to unbiased population estimation vs. unbiased estimation in an independent sample of equal size. See Yin, P. & Fan, X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of analytical methods. The Journal of Experimental Education, 69, 203-224.
Modern-day regression methods also address the shrinkage of regression coefficients as well as R2 as a consequence -- e.g., the elastic net with k-fold cross validation, see http://web.stanford.edu/~hastie/Papers/elasticnet.pdf.
Best Answer
M- and other robust estimators are not linear in the observations. Note that this is a different issue from fitting a linear model; it concerns how the estimator is computed! In fact being linear in the observations means that outliers cannot be down-weighted, which is what robust estimators aim to do.
In fact they do have (sometimes dramatically) lower variance under heavy tailed error distributions and outlier-generating models (you can simulate this if you want to convince yourself), however they are not better under all non-normal distributions. OLS can be better under distributions with lighter tails than the normal.
Note by the way that if you want to be robust against leverage points, you need to do something more sophisticated than a standard M-estimator, for example an MM- or tau-estimator (see robustbase package in R).