This becomes easy when you reparameterize the problem.
Instead of using a slope and intercept, notice that when there are just two distinct values of the $x_i$ you can describe the fit by giving its value $\eta_0$ for $x=0$ and its value $\eta_1$ for $x=1$.
This example shows the data as red dots, the OLS fit as a dashed line, and summarizes the two groups with boxplots. Group $A$ is at the left and group $B$ at the right. The slope of the line is precisely the amount needed to go from the mean of group $A$, with $\eta_0$ near $10$, to the mean of group $B$, with $\eta_1$ near $13$.
Least squares requires you to choose values of these parameters that minimize the sum of squares of residuals. Since the value of $\eta_0$ affects the residuals only for group $A$ (where $x_i=0$) and $\eta_1$ affects the residuals only for group $B$ (where $x_i=1$), each will be estimated as the mean of its associated group. Because these means also happen to be the Maximum Likelihood estimates (as well as the OLS estimates), the ML estimate of the slope (which is also its OLS estimate) must be
$$b_1 = \frac{\hat\eta_1 - \hat\eta_0}{1-0} = \hat\eta_1 -\hat\eta_0,$$
which is just the difference in the group means. The OLS estimate of its variance (which does differ from the ML estimate, so we cannot exploit ML at this point) is the sum of squared residuals divided by the degrees of freedom, which is $n-2$. It should be equally obvious that this is precisely the pooled variance for the two-sample t-test. Consequently, $b_1/se(b_1)$ is exactly the same--and computed in exactly the same way--as the Student t statistic.
I think you would be much better off by using matrix algebra throughout.
As for your first question, remember that the fits in a linear model are obtained as:
$$\hat{y} = H y$$
Intuitively, $h_{ii}=1$ means that observation $y_i$ fully determines $\hat{y}_{i}$, so in a sense it has maximum leverage. If $h_{ii} \approx 0$, that would imply that observation $y_i$ has very little role in determining $\hat{y}_{i}$ which would be mostly determined by the rest of the observations.
Best Answer
First note that this formula applies just to simple linear regression where you're modeling $y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$.
$\newcommand{\1}{\mathbf 1}$We can represent our regression as $y = X\beta + \varepsilon$ with $X = (\1 \mid x)$ where $x \in \mathbb R^n$ is the non-intercept univariate predictor; by assumption $X$ is full rank and this is equivalent to $x$ not being constant. This means $$ H = X(X^TX)^{-1}X^T = (\1 \mid x)\left(\begin{array}{cc}n & x^T\1 \\ x^T\1 & x^Tx\end{array}\right)^{-1}{\1^T\choose x^T}. $$ We can use the formula for the explicit inverse of a $2\times 2$ matrix to find $$ (X^TX)^{-1} = \frac{1}{nx^Tx - (x^T\1)^2}\left(\begin{array}{cc}x^Tx & -x^T\1 \\ -x^T\1 & n\end{array}\right) $$ so all together we can do the multiplication to get $$ H = \frac{1}{n x^Tx - (\1^T x)^2}\left(x^Tx\cdot \1\1^T - x^T\1 \cdot (\1 x^T + x \1^T) + n xx^T\right). $$ This means $$ h_i = \frac{x^Tx - 2x^T\1\cdot x_i + nx_i^2}{n x^Tx - (\1^T x)^2}. $$ For the numerator, I can use the fact that $\1^Tx = n \bar x$ to rewrite it as $$ x^Tx - 2nx_i\bar x + n x_i^2 = x^Tx + n(x_i^2 - 2 x_i\bar x + \bar x^2 - \bar x^2) \\ = x^Tx - n\bar x^2 + n(x_i - \bar x)^2. $$ Can you finish from here?
(later update) For the sake of completeness I'll finish the proof now.
$(\1^T x)^2 = n^2(\1^T x / n)^2 = n^2{\bar x}^2$ so $$ h_i = \frac{x^Tx - n\bar x^2 + n(x_i - \bar x)^2}{n x^Tx - (\1^T x)^2} \\ = \frac{x^Tx - n\bar x^2 + n(x_i - \bar x)^2}{n x^Tx - n^2{\bar x}^2} \\ = \frac 1n + \frac{(x_i - \bar x)^2}{x^Tx - n{\bar x}^2} $$ and then it's well known that $$ x^Tx - n{\bar x}^2 = \sum_{i}(x_i - \bar x)^2 $$ so we're done.