Unbiased estimates are typical in introductory statistics courses because they are: 1) classic, 2) easy to analyze mathematically. The Cramer-Rao lower bound is one of the main tools for 2). Away from unbiased estimates there is possible improvement. The bias-variance trade off is an important concept in statistics for understanding how biased estimates can be better than unbiased estimates.
Unfortunately, biased estimators are typically harder to analyze. In regression, much of the research in the past 40 years has been about biased estimation. This began with ridge regression (Hoerl and Kennard, 1970). See Frank and Friedman (1996) and Burr and Fry (2005) for some review and insights.
The bias-variance tradeoff becomes more important in high-dimensions, where the number of variables is large. Charles Stein surprised everyone when he proved that in the Normal means problem the sample mean is no longer admissible if $p \geq 3$ (see Stein, 1956). The James-Stein estimator (James and Stein 1961) was the first example of an estimator that dominates the sample mean. However, it is also inadmissible.
An important part of the bias-variance problem is determining how bias should be traded off. There is no single “best” estimator. Sparsity has been an important part of research in the past decade. See Hesterberg et al. (2008) for a partial review.
Most of the estimators referenced above are non-linear in $Y$. Even ridge regression is non-linear once the data is used to determine the ridge parameter.
An estimator is consistent if $\hat{\beta} \rightarrow_{p} \beta$
Or $\lim_{n \rightarrow \infty} \mbox{Pr}(|\hat{\beta} - \beta| < \epsilon) = 1 $ for all positive real $\epsilon$.
Consistency in the literal sense means that sampling the world will get us what we want. There are inconsistent minimum variance estimators (failing to find the famous example by Google at this point).
Unbiased minimum variance is a good starting place for thinking about estimators. Sometimes, it's easier to understand that we may have other criteria for "best" estimators. There is the general class of minimax estimators, and there are estimators that minimize MSE instead of variance (a little bit of bias in exchange for a whole lot less variance can be good). These estimators can be consistent because they asymptotically converge to the population estimates.
The interpretation of the slope parameter comes from the context of the data you've collected. For instance, if $Y$ is fasting blood gluclose and $X$ is the previous week's caloric intake, then the interpretation of $\beta$ in the linear model $E[Y|X] = \alpha + \beta X$ is an associated difference in fasting blood glucose comparing individuals differing by 1 kCal in weekly diet (it may make sense to standardize $X$ by a denominator of $2,000$.
That is what you consistently estimate with OLS, the more that $n$ increases.
WRT #2 Linear regression is a projection. The predictors we obtain from projecting the observed responses into the fitted space necessarily generates it's additive orthogonal error component. These errors are always 0 mean and independent of the fitted values in the sample data (their dot product sums to zero always). This holds regardless of homoscedasticity, normality, linearity, or any of the classical assumptions of regression models.
Best Answer
Not necessarily; Consistency is related to Large Sample size i.e. as we increase the number of samples, the estimate should converge to the true parameter - essentially, as $n \to \infty$, the $\text{var}(\hat\beta) \to 0$, in addition to $\Bbb E(\hat \beta) = \beta$.
Update:
Please refer to the proofs of unbiasedness and consistency for OLS here.
Wrt your edited question, unbiasedness requires that $\Bbb E(\epsilon |X) = 0$. Consistency additionally requires LLN and Central Limit Theorem. So, under some peculiar cases (e.g. error terms follow a Cauchy distribution), it is possible that unbiasedness does not imply consistency.