No, we cannot do this.
I'll use a branching quantifier to express$^1$ "the universe is infinite" - by the compactness theorem, we can't do this in first-order logic. I'll then show how this gives an answer to the question you've asked, and then answer a secondary question implicit in your post. (And then there will be a footnote.)
Expressing infinitude
Consider the sentence $$(*)\quad\exists x\forall y_1\exists_{y_1}z_1\forall y_2\exists_{y_2}z_2[z_1\not=x\wedge z_2\not=x\wedge(y_1\not=y_2\iff z_1\not=z_2)]$$ (as it will turn out, this can be simplified a bit, but I think the way I've written it is clearer). Here "$\exists_\overline{a}b$" means that $b$ depends only on the tuple $\overline{a}$.
I claim that for every structure $\mathcal{A}$, we have $$\mathcal{A}\models(*)\iff\mathcal{A}\mbox{ is infinite}.$$ One direction is easy: if $\mathcal{A}$ is infinite, there is an injection $i$ from $\mathcal{A}$ to itself which is not a surjection. Pick some $a\in\mathcal{A}\setminus im(i)$ to be our $x$ and use $i$ to choose our $z$s ($z_1=i(y_1), z_2=i(y_2)$).
The other direction is the interesting one: why can't $(*)$ have finite models? To establish this, we need to do some analysis of $(*)$.
Moving to second-order logic, $(*)$ is equivalent to the sentence $$(**)\quad\exists x, F,G\forall y_1,y_2[F(y_1)\not=x\wedge G(y_2)\not=x\wedge (y_1\not=y_2\iff F(y_1)\not=G(y_2))].$$ Here $F,G$ are our "strategies" for picking $z_1,z_2$ respectively; the dependence of $z_k$ only on $y_k$ $(k\in\{1,2\})$ is reflected in the inputs each function takes.
Suppose $\mathcal{A}\models(**)$. Pick appropriate witnesses to this: some $a\in\mathcal{A}$ and some specific functions $U,V:\mathcal{A}\rightarrow\mathcal{A}$ such that
$(i)$ for every $c_1,c_2\in\mathcal{A}$ we have $U(c_1)\not=a\not=V(c_2)$, and
$(ii)$ for every $d_1,d_2\in\mathcal{A}$ we have $U(d_1)=V(d_2)\iff d_1=d_2$.
(Of course, such witnesses won't be unique, but so what?)
Our first observation is that from $(ii)$ we get a huge amount of information about $U$ and $V$. Namely, the following two facts:
- $U$ and $V$ are the same function. Suppose otherwise. Then for some $d\in\mathcal{A}$ we have $U(d)\not=V(d)$. Now take $d_1=d_2=d$ and note that this shows that $(ii)$ above fails.
I'll now use "$W$" to refer to this function, in order to avoid apparent asymmetry.
- The function $W$ is injective. Suppose otherwise. Then we have some $d_1,d_2\in\mathcal{A}$ such that $d_1\not=d_2$ but $W(d_1)=W(d_2)$; remembering that $U=W=V$, we get that $d_1\not=d_2$ but $U(d_1)=V(d_2)$, again contradicting point $(ii)$ above.
Our second observation is simpler: clause $(i)$ tells us that the function $W$ must not have $a$ in its image.
So putting this all together, we've concluded:
If $\mathcal{A}\models(*)$, then there is an injection from $\mathcal{A}$ to itself which is not a surjection.
But this can only happen if $\mathcal{A}$ is infinite! So we're done.
From satisfaction to definability
This doesn't really answer your question, which was about definable sets rather than classes of models. But this is just a quick hop away.
Let $\mathcal{N}$ be a nonstandard model of PA with standard part $S$. Then $S$ is clearly undefinable in first-order logic, but it is definable via $(*)$: it's the set of elements whose corresponding initial segments satisfy $\neg(*)$.
- This last bit deserves an explanation. First-order logic allows relativization: given a structure $\mathcal{S}$ and a definable substructure $\mathcal{X}\subseteq\mathcal{S}$, for each sentence $\varphi$ there is a sentence $\varphi^\mathcal{X}$ such that $$\mathcal{S}\models\varphi^\mathcal{X}\iff\mathcal{X}\models\varphi;$$ $\varphi^\mathcal{X}$ is gotten by "relativizing" quantifiers, e.g. replacing "$\exists x(\psi(x))$" with "$\exists x(x\in\mathcal{S}\wedge\psi(x))$." In fact relativization is a bit broader than what I've described, but ignore that for now.
Of course, the above assumes $Con(PA)$; what if we don't want to assume that? Well, we're still in luck. Take as our structure $\mathcal{A}$ a set with an equivalence relation such that $(i)$ there are infinitely many infinite classes and $(ii)$ for every finite $n$ there is a finite class with $>n$ elements. It's a standard exercise to show that the set of elements whose classes are infinite is not definable, but it is definable using $(*)$. And this argument can be appropriately formulated and proved in very weak systems - in particular, much weaker than even PA itself.
More definability: the natural numbers
One point that you've raised, which the above doesn't address, is what happens in the structure $\mathcal{N}=(\mathbb{N}; +,\times)$:
Is there a branching-quantifier definable subset of $\mathcal{N}$ which isn't first-order definable?
The answer is no - $\mathcal{N}$ can "first-orderize" branching quantifiers - but the proof is a bit subtle. The obstacle of course is that we want to quantify over functions (= strategies), but first-order logic doesn't let us do that, and the failure of Konig's lemma (as you've correctly observed) means that we can't just look at "finite pieces" of functions.
Getting around this obstacle takes two observations:
The first observation is that for every specific branching-quantifier-formula $\varphi$, there is some $n$ such that for every tuple $\overline{a}$ such that $\varphi(\overline{a})$ holds we have a witness to $\varphi(\overline{a})$ which is $\Sigma_n$-definable; so to tell whether $\varphi(\overline{a})$ holds we just need to search over the $\Sigma_n$-definable functions.
The second observation lets us do exactly that. While Tarski showed that truth in $\mathcal{N}$ is not definable in $\mathcal{N}$, we nonetheless have that the set of true $\Sigma_n$-formulas is definable for every specific $n$. This lets us search over $\Sigma_n$-definable functions, for each specific $n$. Combined with the previous key point, we get the desired result.
Now, this isn't a proof, since each of the two facts above is definitely nontrivial, but in the interests of length I won't sketch their proofs here; if you're interested, I recommend asking a further MSE question (and I'll answer it myself if I have the time).
$^1$The axiom of choice
Wait, what now?
Well, I've used the following claim:
A set is infinite iff there is a non-surjective injection from the set to itself.
This actually isn't provable in ZF (= set theory without choice) alone; the left-to-right direction can fail, and counterexamples are called "infinite Dedekind-finite sets."
However, this isn't really a problem: we still have that every model of $(*)$ must be Dedekind-infinite, and that's enough - ZF alone proves that there is no sentence which is true in exactly the Dedekind-infinite structures. I chose to assume the axiom of choice since I think the situation is a bit easier to understand if we initially assume that infinite sets "work properly" - in particular, it's easier to think about the compactness theorem when you don't have to worry about choice.
Best Answer
There are a number of examples coming from computability theory, specifically the various "degree structures." For example, the two-quantifier theory of the Turing degrees (as a partial order) is decidable, but the three-quantifier theory of the Turing degrees is undecidable. Interestingly, the same combinatorial fact - that every countable lattice is isomorphic to an initial segment of the Turing degrees (due to Lerman) - is used in both of these facts. See the introduction and bibliography of Shore/Slaman, The $\forall\exists$ theory of $\mathcal{D}(\le,\vee,')$ is undecidable (and note that, per the title, the precise structure we put on the degrees matters a ton in terms of exactly where we switch from decidability to undecidability).