I would like a proof that first order logic is undecidable, but without using arithmetic
But hold on! The proposition "first order logic is undecidable" is, when unpacked, just a proposition of the form "there is no computable function which does so-and-so". So of course a proof of this proposition is going to have to come from an application of the general theory of computable functions. (We might only need a naive ability to recognise a computation when we see it to establish the positive claim that some function can be computed: but to establish a negative, that there is no computable function which does so-and-so, we do need some general theory which can enable us to talk about the limits of the computable).
Now, the relevant computations here are over finite strings that constitute wffs and proofs, so the background theory we need in fact is or is tantamount to the theory of arithmetical computable functions (for you can code finite strings by numbers in familiar ways; or indeed we can go in the opposite direction and reconstruct arithmetic in the theory of finite strings). So the natural lines of proof that first order logic is undecidable will come from the theory of computable arithmetical functions (perhaps in slight disguise). In sum, broadly arithmetical considerations (far from being extraneous, a "tortuous" diversion) are exactly what we would expect to use here.
Added: What I wrote above was [at least intended to be] consistent with the sort of points Carl Mummert's makes in his typically elegant answer. It is the theory of computability that needs to be invoked: but the point at which it gets applied doesn't have to be to a formal arithmetic. As Carl very nicely points out, we could go instead via Tarksi's proof about groups. However, I was taking it that that sort of thing would seem to the OP even more "tortuous" a route to the undecidability result; and we are still applying the theory of computable numerical functions.
That's the main point to be made, but I can perhaps add a subsidiary comment on the remark
We cannot construct a model of ϕ just because it requires an infinite domain.
Well, by any ordinary constructive standards the wff you describe (losing the unintended negation) has a constructively acceptable model -- the natural numbers in their natural order! What is wrong with that?
Yes, you can prove it with "usual" proof systems : Resolution, Tableaux or with Natural Deduction :
1) $∀z∀x∃y(R(z,y)∨¬P(x))$ --- premise
2) $∃xP(x)$ --- assumed [a]
3) $P(w)$ --- assumed [b] for $\exists$-elimination
4) $∃y(R(z,y)∨¬P(w))$ --- from 1) by $\forall$-elimination twice
5) $R(z,y)∨¬P(w)$ --- assumed [c] for $\exists$-elimination
6) $P(w) \to R(z,y)$ --- from 5) by tautological equivalence : $(p \to q) \leftrightarrow (\lnot p \lor q)$
7) $R(z,y)$ --- from 3) and 6) by $\to$-elimination
8) $\exists yR(z,y)$ --- from 7) by $\exists$-introduction
9) $\forall x \exists yR(x,y)$ --- from 8) by $\forall$-introduction : $x$ is not free in any "open" assumptions
$y$ is not free in 9); thus, we can apply $\exists$-elimination with 4), 5) and 9) and conclude with :
10) $\forall x \exists yR(x,y)$, discharging assumption [c].
In the same way, from 2), 3) and 10), discharging assumption [b] and concluding with :
11) $\forall x \exists yR(x,y)$.
Finally :
$∃xP(x) \to \forall x \exists yR(x,y)$ --- from 2) and 11) by $\to$-introduction, discharging assumption [a].
With a final application of $\to$-introduction we conclude with :
$∀z∀x∃y(R(z,y)∨¬P(x)) \to (∃xP(x) \to ∀x∃yR(x,y))$.
Best Answer
We assume that your system is the usual first-order logic with equality.
For $n\ge 2$, let $S_n$ be the sentence that says $\forall x\exists y_1\exists y_2\cdots \exists y_n\phi(x,y_1,y_2,\dots, y_n)$. Here $\phi$ is the conjunction of (i) formulas $R(x,y_i)$, where $i$ ranges over the integers from $1$ to $n$ and (ii) formulas of the shape $\lnot(y_i=y_j)$, for all $i,j$ where $1\le i\lt j\le n$.
Adding all the $S_n$ to your axiom system forces all the equivalence classes to have infinitely many elements.
Forcing infinitely many equivalence classes is simpler. Let $T_n$ be the sentence that says that $\exists x_1\exists x_2\cdots \exists x_n\psi(x_1,x_2,\dots,x_n)$. Here $\psi$ is the conjunction of all $\lnot R(x_i,x_j)$, where $i,j$ ranges over all $1\le i\lt j\le n$. Adding $T_n$ for all $n\ge 2$ forces the existence of infinitely many equivalence classes.