At the start of your derivation you multiply out the brackets $\sum_i (x_i - \bar{x})(y_i - \bar{y})$, in the process expanding both $y_i$ and $\bar{y}$. The former depends on the sum variable $i$, whereas the latter doesn't. If you leave $\bar{y}$ as is, the derivation is a lot simpler, because
\begin{align}
\sum_i (x_i - \bar{x})\bar{y}
&= \bar{y}\sum_i (x_i - \bar{x})\\
&= \bar{y}\left(\left(\sum_i x_i\right) - n\bar{x}\right)\\
&= \bar{y}\left(n\bar{x} - n\bar{x}\right)\\
&= 0
\end{align}
Hence
\begin{align}
\sum_i (x_i - \bar{x})(y_i - \bar{y})
&= \sum_i (x_i - \bar{x})y_i - \sum_i (x_i - \bar{x})\bar{y}\\
&= \sum_i (x_i - \bar{x})y_i\\
&= \sum_i (x_i - \bar{x})(\beta_0 + \beta_1x_i + u_i )\\
\end{align}
and
\begin{align}
\text{Var}(\hat{\beta_1})
& = \text{Var} \left(\frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2} \right) \\
&= \text{Var} \left(\frac{\sum_i (x_i - \bar{x})(\beta_0 + \beta_1x_i + u_i )}{\sum_i (x_i - \bar{x})^2} \right), \;\;\;\text{substituting in the above} \\
&= \text{Var} \left(\frac{\sum_i (x_i - \bar{x})u_i}{\sum_i (x_i - \bar{x})^2} \right), \;\;\;\text{noting only $u_i$ is a random variable} \\
&= \frac{\sum_i (x_i - \bar{x})^2\text{Var}(u_i)}{\left(\sum_i (x_i - \bar{x})^2\right)^2} , \;\;\;\text{independence of } u_i \text{ and, Var}(kX)=k^2\text{Var}(X) \\
&= \frac{\sigma^2}{\sum_i (x_i - \bar{x})^2} \\
\end{align}
which is the result you want.
As a side note, I spent a long time trying to find an error in your derivation. In the end I decided that discretion was the better part of valour and it was best to try the simpler approach. However for the record I wasn't sure that this step was justified
$$\begin{align}
& =.
\frac{1}{(\sum_i (x_i - \bar{x})^2)^2}
E\left[\left( \sum_i(x_i - \bar{x})(u_i - \sum_j \frac{u_j}{n})\right)^2 \right] \\
& =
\frac{1}{(\sum_i (x_i - \bar{x})^2)^2} E\left[\sum_i(x_i - \bar{x})^2(u_i - \sum_j \frac{u_j}{n})^2 \right]\;\;\;\;\text{ , since } u_i \text{ 's are iid} \\
\end{align}$$
because it misses out the cross terms due to $\sum_j \frac{u_j}{n}$.
It helps to think carefully about exactly what type of objects $\hat \theta$ and $\hat g$ are.
In the top case, $\hat \theta$ would be what I would call an estimator of a parameter. Let's break it down. There is some true value we would like to gain knowledge about $\theta$, it is a number. To estimate the value of this parameter we use $\hat \theta$, which consumes a sample of data, and produces a number which we take to be an estimate of $\theta$. Said differently, $\hat \theta$ is a function which consumes a set of training data, and produces a number
$$ \hat \theta: \mathcal{T} \rightarrow \mathbb{R} $$
Often, when only one set of training data is around, people use the symbol $\hat \theta$ to mean the numeric estimate instead of the estimator, but in the grand scheme of things, this is a relatively benign abuse of notation.
OK, on to the second thing, what is $\hat g$? In this case, we are doing much the same, but this time we are estimating a function instead of a number. Now we consume a training dataset, and are returned a function from datapoints to real numbers
$$ \hat g: \mathcal{T} \rightarrow (\mathcal{X} \rightarrow \mathbb{R}) $$
This is a little mind bending the first time you think about it, but it's worth digesting.
Now, if we think of our samples as being distributed in some way, then $\hat \theta$ becomes a random variable, and we can take its expectation and variance and whatever we want, with no problem. But what is the variance of a function valued random variable? It's not really obvious.
The way out is to think like a computer programmer, what can functions do? They can be evaluated. This is where your $x_i$ comes in.
In this setup, $x_i$ is just a solitary fixed datapoint. The second equation is saying as long as you have a datapoint $x_i$ fixed, you can think of $\hat g$ as an estimator that returns a function, which you immediately evaluate to get a number. Now we're back in the situation where we consume datasets and get a number in return, so all our statistics of number values random variables comes to bear.
I've discussed this in a slightly different way in this answer.
Is it correct to think of this as each observation/fitted value having its own variance and bias?
Yup.
You can see this in confidence intervals around scatterplot smoothers, they tend to be wider near the boundaries of the data, as there the predicted value is more influenced by the neighborly training points. There are some examples in this tutorial on smoothing splines.
Best Answer
I went through the math and ended up with variant C:
$$Var(X) = \frac{(\sum_i \omega_i)^2}{(\sum_i \omega_i)^2 - \sum_i \omega_i^2}\overline V$$ where $\overline V$ is the non corrected variance estimation. The formula agrees with the unweighted case when all $\omega_i$ are identical. I detail the proof below:
Setting $\lambda_i = \frac{\omega_i}{\sum_i \omega_i}$, we have
$$\overline V = \sum_i \lambda_i (x_i - \sum_j \lambda_j x_j)^2$$
Expanding the inner term gives: $$(x_i - \sum_j \lambda_j x_j)^2 = x_i^2 + \sum_{j, k} \lambda_j \lambda_k x_j x_k - 2 \sum_j \lambda_j x_i x_j $$
If we take the expectation, we have that $E[x_i x_j] = Var(X)1_{i = j} + E[X]^2$, the term $E[X]$ being present in each term, it cancels out and we get:
$$E[\overline V] = Var(X) \sum_i \lambda_i (1 + \sum_j \lambda_j^2- 2 \lambda_i )$$ that is $$E[\overline V] = Var(X) (1 - \sum_j \lambda_j^2)$$ It remains to plug in the expression of $\lambda_i$ with respect to $\omega_i$ to get variant C.