Strategy
It can be illuminating to move back and forth among three points of view: the statistical one (viewing $x_i$ and $y_i$ as data), a geometrical one (where least squares solutions are just projections in suitable Euclidean spaces), and an algebraic one (manipulating symbols representing matrices or linear transformations). Doing this not only streamlines the ideas, but also exposes the assumptions needed to make the result true, which otherwise might be buried in all the summations.
Notation and assumptions
So: let $y = X\beta + u$ where $y$ is an $n$-vector, $X$ is an $n$ by $p+1$ "design matrix" whose first column is all ones, $\beta$ is a $p+1$-vector of true coefficients, and $u$ are iid random variables with zero expectation and common variance $\sigma^2$. (Let's continue, as in the question, to begin the coefficient vector's indexes with $0$, writing ${\beta} = ({\beta_0}, {\beta_1}, \ldots, {\beta_p})$.) This generalizes the question, for which $X$ has just two columns: its second column is the vector $(x_1, x_2, \ldots, x_n)'$.
Basic properties of OLS regression
The regression estimate $\hat{\beta}$ is a $p+1$-vector obtained by applying a linear transformation $\mathbb{P}$ to $y$. As the solution to the regression, it projects the exact values $X\beta$ onto the true values $\beta$: $$\mathbb{P}\left(X\beta\right) = \beta.$$ Finally--and this is the crux of the matter before us--it is obvious statistically (thinking of these values as data)--that the projection of $y = (1, 1, \ldots, 1)'$ has the unique solution $\hat{\beta} = (1, 0, 0, \ldots, 0)$: $$\mathbb{P}1_n' = (1, 0, 0, \ldots, 0),$$ because when all the responses $y_i$ are equal to $1$, the intercept $\beta_0 = 1$ and all the other coefficients must vanish. That's all we need to know. (Having a formula for $\mathbb{P}$ in terms of $X$ is unimportant (and distracting).)
Easy preliminaries
Begin with some straightforward algebraic manipulation of the original expression:
$$\eqalign{
(\hat{\beta}-\beta)\bar{u} &= (\mathbb{P}y-\beta)\bar{u} \\
&= (\mathbb{P}(X\beta+u)-\beta)\bar{u} \\
&= \mathbb{P}(X\beta+u)\bar{u} - \beta\bar{u} \\
&= (\mathbb{P}X\beta)\bar{u} + \mathbb{P}u\bar{u} - \beta\bar{u} \\
&= \beta\bar{u} + \mathbb{P}u\bar{u} - \beta\bar{u}\\
&= \mathbb{P}(u\bar{u}).
}
$$
This almost mindless sequence of steps--each leads naturally to the next by simple algebraic rules--is motivated by the desire to (a) express the random variation purely in terms of $u$, whence it all derives, and (b) introduce $\mathbb{P}$ so that we can exploit its properties.
Computing the expectation
Taking the expectation can no longer be put off, but because $\mathbb{P}$ is a linear operator, it will be applied to the expectation of $u\bar{u}$. We could employ some formal matrix operations to work out this expectation, but there's an easier way. Recalling that the $u_i$ are iid, it is immediate that all the coefficients in $\mathbb{E}[u\bar{u}]$ must be the same. Since they are the same, each one equals their average. This can be obtained by averaging the coefficients of $u$, multiplying by $\bar{u}$, and taking the expectation. But that's just a recipe for finding $$\mathbb{E}[\bar{u}\bar{u}] = \text{Var}[\bar{u}] = \sigma^2/n.$$ It follows that $\mathbb{E}[u\bar{u}]$ is a vector of $n$ values, all of which equal $\sigma^2/n$. Using our previous vector shorthand we may write $$\mathbb{E}[(\hat{\beta}-\beta)\bar{u} ] = \mathbb{E}[\mathbb{P}(u\bar{u})] = \mathbb{PE}[u\bar{u}]=\mathbb{P}1_n'\sigma^2/n=(\sigma^2/n, 0, 0, \ldots, 0).$$
Conclusion
This says that the estimated coefficients $\hat{\beta_i}$ and the mean error $\hat{u}$ are uncorrelated for $i=1, 2, \ldots, p$, but not so for $\hat{\beta_0}$ (the intercept).
It is instructive to review the steps and consider which assumptions were essential and which elements of the regression apparatus simply did not appear in the demonstration. We should expect any proof, no matter how elementary or sophisticated, will need to use the same (or stronger) assumptions and will need, in some guise or another, to include calculations of $\mathbb{P}1_n'$ and $\mathbb{E}[u\bar{u}]$.
So we know its sufficient for that estimator to be unbiased if $\sum c_i=0$ and $\sum c_iX_i=1$
We want to show it's necessary:
Obviously if one of these holds and the other doesn't then we won't get unbiased.
Now suppose both fail:$\sum c_i=\gamma\not=0$ and $\sum c_iX_i=\delta\not=1$, then
$E(\hat\beta_1)=\beta_0\gamma+\beta_1\delta\not=\beta_1$ in general.
The reason is because these constants cannot be based on the parameters because our estimator has to work for any possible value the parameter may take on.
Best Answer
Here are some tips. First, just like variances, covariance can ignore additive constants. You can use this to show: $$cov (\hat {\beta},\overline {Y})=cov (\hat {\beta},\overline {\epsilon}) $$
Then, you can show that
$$\hat{\beta}=\sum_iw_iY_i $$
Where $w_i=\frac {x_i-\overline {x}}{S_{XX}} $. You can ignore constants as well to get
$$cov (\hat {\beta},\overline {Y})=cov (\sum_iw_i\epsilon_i,\overline {\epsilon}) $$
Then you can apply your result at the start of your question.