Let $T$ be a local minimum spanning tree that is not a global minimum.
Let $e_1,\ldots,e_m$ the edges of $T$ ordered by non-decreasing weight.
Let $T'$ be a minimum weight spanning tree that coincides with $T$ on the largest possible begin segment,
so we may assume $T'$ has edges $e_1,\ldots,e_{n-1},f_n,\ldots,f_m$ (again ordered by non-decreasing weight)
and $f_n\ne e_n$. Let $w$ be the weight function in our graph.
We claim that $w(f_n)<w(e_n)$: indeed, if $w(e_n)\leq w(f_n)$ then $e_1,\ldots,e_{n-1},e_n$ would
be a valid startup for Kruskal's algorithm and would lead to a minimum weight spanning tree that
coincides with $T$ on a larger initial segment.
$T+f_n$ contains exactly one cycle $C$. The edges of $C$ different from $f_n$ cannot all be in
$\{e_1,\ldots,e_{n-1}\}$ or we would have a cycle in $T'$. So we find at least one edge $f$ on $C$ that
is different from $e_1,\ldots,e_{n-1}$ and $f_n$. But then $w(f)\geq w(e_n)>w(f_n)$,
so $T+f_n-f$ is a spanning tree with a smaller total weight that is a neighbour of $T$ in the spanning tree graph. Contradiction.
(I left some small details for you to prove. Let me know if they cause trouble).
Note that this proves the slightly stronger statement, that each non-minimal tree has
a neighbour of strictly smaller total weight.
Hint: Your idea is good. Using the fact that two leaves of $T$ must be joint by an edge in $G$, try to construct a spanning tree which has less leaves than $T$; this gets you a contradictiotn.
Proof:
Let $G$ be a graph on $n$ vertices with $\alpha(G)\geq 2$. Let $T$ be a spanning tree of $G$ with as few leaves as possible; denote the number of leaves of $T$ as $t$.
Suppose that $t>\alpha(G)$, then there exists two leaves in $T$, say $x,y$, such that $xy$ is an edge in $G$.
Consider the graph $T+xy$ obtained by adding the edge $xy$ to $T$; note that $T+xy$ has $t-2$ vertices of degree $1$. Since $T$ is a tree, ie a maximal acyclic graph, then there exists a cycle in $T+xy$ which contains $x$ and $y$, say $v_0v_1v_2...v_mv_0$, with $v_0=x$ and $v_m=y$.
Consider now the vertices $v_1,v_2,v_3...,v_{m-1}$ in $T$. If all of them have degree 2 then $xv_1v_2...v_{m-1}y$ is a connected component of $T$; since $T$ is connected this imples that $T=xv_1v_2...v_jy$ ie $T$ is a path. But a path has exactly $2$ leaves which implies $2>\alpha(G)$, contradiction.
Thus there exists some $i\in\{1,2,3,...,m-1\}$ such that $v_i$ has degree greater than $2$. Consider the graph $T'=T+xy-v_iv_{i+1}$ obtained by deleting the edge $v_iv_{i+1}$ from $T+xy$.
$T'$ is connected (since we removed an edge in a cycle) and has $n-1$ edges; thus $T'$ is a tree.
Note that by construction the degree of $v_i$ in $T'$ is grater than $1$, hence $T'$ has at most $t-1$ leaves.
We have constructed a spanning tree $T'$ with at most $t-1$ leaves. This contradicts the initial assumption that $T$ is a spanning graph with as few leaves as possible. Thus $t\leq\alpha(G)$.
Best Answer
A counterexample is the complete bipartite graph $K_{3,n}$.
Here, there are $3$ vertices of degree $n$, and $n$ vertices of degree $3$. However, in any spanning tree, at most one of the degree-$3$ vertices of the graph can be a branch vertex, or else a cycle is created. Therefore there is no spanning tree with more than $4$ branch vertices (three from the side with degree $n$, and one from the side with degree $3$).
We can generalize this example to one where $n_{\ge 3}$ and the maximum degree are not as close together. Take $k$ copies of $K_{3,n}$, and let $\{u_i, v_i, w_i\}$ be the degree-$n$ vertices of the $i^{\text{th}}$ copy. By adding edges $w_i u_{i+1}$ for $i=1, \dots, k-1$, we obtain a connected graph with maximum degree $n+1$ and with $n_{\ge 3} = k(n+3)$.
In any spanning tree, there are at most $4k$ branch vertices: from any copy of $K_{3,n}$, we can choose $u_i$, $v_i$, $w_i$, and at most one other vertex, or else we create a cycle. This is only a $\frac{4}{n+3}$ fraction of $n_{\ge 3}$.
Here is an interesting partial result.
Theorem. Every graph with $n_{\ge 3}$ vertices of degree at least $3$ and maximum degree $\Delta$ has a spanning tree with $\Omega(n_{\ge 3}/\Delta)$ branch vertices.
(Note that this is asymptotically matched by the second example above.)
Proof. Begin by iteratively deleting any edges for which this will keep the graph connected and keep the same value of $n_{\ge 3}$. After we've done this, the vertices of degree $3$ or more can be classified into $3$ types:
(Every vertex of degree $4$ or more must be type 2 or type 3, or else it has at most $2$ incident bridges and at most $1$ edge to a vertex of degree exactly $3$, so at least one remaining edge out of the vertex could still be removed.)
Every type-3 vertex has at least $2$ adjacent type-1 vertices, and every type-1 vertex has at most $3$ adjacent type-3 vertices, so there are at most $1.5$ times as many type-3 vertices as type-1. So if we forget about type-3 vertices and only look at types 1 and 2, we still have at least $\frac25 n_{\ge 3}$ vertices to look at.
Among the type 1 vertices, greedily pick a subset $S$ such that no two vertices of $S$ are at distance $1$ or $2$. This constraint eliminates fewer than $3\Delta$ other possibilities every time we pick one, so at least a $\frac1{3\Delta}$ fraction of the type-1 vertices make it into $S$.
Now pick a spanning tree as follows: for every vertex in $S$, pick all its edges (which cannot yet create a cycle) and then complete the resulting forest to an arbitrary spanning tree. In the result, every vertex of $S$ is a branch vertex, and so is every type-2 vertex (because those must keep all their bridges). As a result, we get at least $\frac1{3\Delta} \cdot \frac25 n_{\ge 3}$ branch vertices, which is $\Omega(n_{\ge 3}/\Delta)$.