I am not sure if I understood you question correctly, but if you are looking to prove that the OLS for $\hat{\beta}$ is BLUE (best linear unbiased estimator) you have to prove the following two things: First that $\hat{\beta}$ is unbiased and second that $Var(\hat{\beta})$ is the smallest among all linear unbiased estimators.
Proof that OLS estimator is unbiased can be found here http://economictheoryblog.com/2015/02/19/ols_estimator/
and proof that $Var(\hat{\beta})$ is the smallest among all linear unbiased estimators can be found here http://economictheoryblog.com/2015/02/26/markov_theorem/
In mathematics rarely things are developed in the way they're presented in textbooks. That's the real reason. Here's the explanation.
First, someone came up with a problem to fit $$y=X\beta+\varepsilon$$, i.e. find the "best" in some respect set of parameters $\beta$. Whoever did this didn't think that the solution would be a liner combination of $y$'s. He simply thought about what would be the criterion to pick the "best" solution, and came up with minimizing the sum of squared errors $\varepsilon'\varepsilon$. This is a very reasonable criterion for many people. So, he went on and formulated the optimization problem:
$$\min_\beta \varepsilon'\varepsilon=\min_\beta(y-X\beta)'(y-X\beta)$$
When the guy solved the problem, he was amazed that the solution turned out to be a linear combination of $y$'s:
$$(X'X)^{-1}X'y$$
He wasn't looking for solutions that are BLUE or linear. He was just looking for a solution of least squares problem. Then his friends jumped on to study this solution from different angles and came up with Gauss-Markov theorem, BLUE etc.
After this was all done people today look at all kinds of formulations of "best" solution criteria, they're not simply sums of squared errors anymore. Some people want to also have "small" $\beta$, which leads to all kinds of shrinkage methods that are not BLUE or linear anymore, and so on.
I like your question a lot because it separates out the linear model specification $X\beta$ on independent variables and the fact that the solution is a linear combination of dependent variables $Cy$. In order to come from the latter to the former one needs special kind of goodness-of-fit criteria, such as minimum of sum of squares. Other goodness-of-fit criteria may lead to non-linear (on $y$) solutions.
Best Answer
The main intuition is that restricted OLS are generally biased. So there is a tradeoff between bias and variance: you reduce variance but you allow bias.
An example: suppose that you want to estimate the average height of the people in your state. You may have learned that if you have a random sample (say of 2000 individuals), a "reasonable" estimator is the sample average, which is unbiased. But you have "prophecy" skills and you know for sure that the average of the population is 175cm. Then there is no variance at all, it is zero, which is lower than any estimator you can come up with. But, except if you are a really good prophet (or your cheated with data) it is likely to be biased.
A more explicit answer to your question would be a direct comparison of the variances of the restricted and unrestricted estimator.
$Var(\beta_u) = \sigma^2(X'X)^{-1}$
$Var(\beta_c) = \sigma^2(X'X)^{-1} - \sigma^2(X'X)^{-1}R'(R(X'X)^{-1}R')^{-1}R(X'X)^{-1}$
Therefore the variance of the restricted estimator is always weakly smaller than the variance unrestricted estimator, with equality when the restrictions are true.
$MSE(\hat{\beta_u}) - MSE(\hat{\beta_c})$ is positive semidefinite [$\beta_u$ is the unconstrained, $\beta_c$ the constrained estimator]
The condition is met iff $\lambda< 1/2$, where
$\lambda = \frac{1}{2\sigma^2}(R\beta-r)'(R(X'X)^{-1}R')^{-1}(R\beta-r)$, where $R\beta = r$ is the vector of contraints you impose.