I think it unlikely that anyone ever proposed a weaker Church's thesis,
because, as Tim Chow points out, diagonalization was known (and known to be
constructive) before anyone ever contemplated a definition of computability.
As early as 1907, Brouwer observed mathematics seems to be incompletable
because of diagonalization, and Goedel thought that there could be no formal
concept of computation until Turing's definition persuaded him otherwise
in 1936. He later said that it is a "kind of miracle" that computability
can be formalized while provability cannot.
Also, Post arrived at a formal definition of computability, via his
concept of normal systems, in the early 1920s, though it was not published.
So the full concept of computability actually arrived before weaker concepts
such as primitive recursive functions.
Update. Here is a more direct construction. (See edit history for previous version.)
There is such a universal computable group as you request. Let $F$
be the free group on infinitely many generators $\langle
a_p\rangle_p$, indexed by the Turing machine programs $p$. Let $G$
be the quotient of this group by all the $k^{th}$ powers $a_p^k$, whenever the program $p$
halts (on trivial input) in exactly $k$ steps.
Let us represent the group $G$ by reduced words in the generators
$a_p$ and their inverses, but in the case that we took the quotient
by $a_p^k$, then in these words we use exponents on $a_p$ in the interval
$(-k/2,k/2]$. (The reason for using this exponent format is that if we were to use only the positive powers of the finite-order generators, then we wouldn't be able to compute inverses in $G$, since we cannot compute whether $a_p$ has finite order or not.) First of all, we can computably recognize whether a
word in the generators fits this description, simply by checking
whether it is reduced and whether any of the exponents is too
large. The point of this last issue is that we can tell if the
exponent $a_p^r$ is too large by checking if program $p$ halts in
$2r$ steps or not. Similarly, we can easily compute the inverse of
a word from the word, and we can computably multiply words. Again,
whenever we have a word with some new exponents on the generators,
we need to check whether they reduce because of our quotient, and
this is possible by running the relevant computation for sufficient
number to steps to determine it.
Thus, we have a computable representation of the group $G$.
Finally, I claim that it is universal in the sense you requested.
Given any Turing machine program $p$, let $x_p=a_p$ and let
$y_p=a_q$ for some other program $q$ known not to halt. Thus, by
design, the group generated by $x_p,y_p$ will be the free group on
these generators if and only if $p$ does not halt.
An essentially equivalent presentation of the group can be made without reference to Turing machines or computations, but only to Diophantine equations, simply by using the Diophantine representation of the universal Turing machine. That is, since every c.e. set is the solution set of a Diophantine equation, there is a fixed Diophantine equation $d(y,\vec x)=0$, such that Turing machine program $p$ halts on trivial input if and only if $d(p,\vec x)=0$ has a solution in the integers, viewing the program as its Gödel code. So we may define the group $G$ as above, with infinitely many generators $a_n$, but taking the quotient by $a_n^k$, if $k$ is the size of the smallest integer solution of $d(n,\vec x)=0$. I'm not sure this makes the group "natural," (and my opinion is that this word has no robust, coherent mathematical meaning), but it does omit any mention of Turing machines, using instead a fixed Diophantine equation.
Lastly, let me observe that my group is not finitely generated, and
it may be interesting to have a finitely generated example, or even
a finitely presented example. I suspect that one can apply one of the embedding theorems to place this example into a finitely generated or even finitely presented group.
Best Answer
Just appeared on the arXiv today: "The physical Church-Turing thesis and the principles of quantum theory," by Pablo Arrighi and Gilles Dowek. http://arxiv.org/abs/1102.1612
Abstract: