Early in the history of recursion theory, the realization that all known proofs in the subject could be relativized in the manner you indicate led Hartley Rogers to make what is called the homogeneity conjecture.
Let $\mathcal{D}$ be the structure of the Turing degrees with the partial order of Turing reducibility $\leq_T$. Let $\mathcal{D}(\geq \mathbf{x})$ be the structure of the Turing degrees that are $\geq_T \mathbf{x}$ also with the partial order $\leq_T$. The homogeneity conjecture says that for any Turing degree $\mathbf{x}$, $\mathcal{D}$ is isomorphic to $\mathcal{D}(\geq \mathbf{x})$.
Richard Shore refuted the homogeneity conjecture in an elegant 1979 paper -- it's only one page long (though it relies on earlier work coding models of arithmetic in $\mathcal{D}$). A couple years later, Harrington and Shore showed that you can do even better. Not only are there Turing degrees $\mathbf{x}$ for which $\mathcal{D}$ and $\mathcal{D}(\geq \mathbf{x})$ aren't isomorphic as structures, there are Turing degrees $\mathbf{x}$ so that $\mathcal{D}$ and $\mathcal{D}(\geq \mathbf{x})$ aren't elementarily equivalent.
So this means that there is a first order sentence $\varphi$ in the language only containing $\leq_T$ which is true about the Turing degrees, but which is false if you relativize the sentence to the Turing degrees $\geq_T \mathbf{x}$ for some $\mathbf{x}$ (and of course, working in the Turing degrees $\geq_T \mathbf{x}$ is equivalent to giving all Turing machines access to $\mathbf{x}$ as an oracle). Hence, the proof that $\varphi$ is true can't be relativized to $\mathbf{x}$.
A somewhat misleading and oversimplified explanation of why this occurs is that you can code models of arithmetic into $\mathcal{D}$ (or $\mathcal{D}(\geq \mathbf{x})$) which can then interpret unrelativized concepts like being arithmetically definable or hyperarithmetically definable.
A particularly spectacular form of this phenomenon would occur if Slaman and Woodin's biinterpertability conjecture is true. The conjecture says that the following relation (on $\overrightarrow{\mathbf{p}}$ and $\mathbf{d}$) is definable in $\mathcal{D}$: "$\overrightarrow{\mathbf{p}}$ codes a standard model of first order arithmetic and a real $X$ such that $X$ is of degree $\mathbf{d}$".
All that being said, just about every proof in recursion theory that I know of relativizes. I'd be very interested in a proof which doesn't relativize and doesn't factor through the coding machinery I've mentioned above.
By the way (and somewhat ironically), the Baker-Gill-Solovay theorem that you mentioned does itself relativize. The relativized version says that for any oracle $X$, there are oracles $A$ and $B$ so that $X$ is poly-time reducible to both $A$ and $B$, and $P^A = NP^A$ and $P^B \neq NP^B$. (We've relativized to $X$ here. The unrelativized result simply says that there are oracles $A$ and $B$ so that $P^A = NP^A$ and $P^B \neq NP^B$). Of course, the real point is that a proof that $P \neq NP$ can't use a technique that relativizes.
Update. Here is a more direct construction. (See edit history for previous version.)
There is such a universal computable group as you request. Let $F$
be the free group on infinitely many generators $\langle
a_p\rangle_p$, indexed by the Turing machine programs $p$. Let $G$
be the quotient of this group by all the $k^{th}$ powers $a_p^k$, whenever the program $p$
halts (on trivial input) in exactly $k$ steps.
Let us represent the group $G$ by reduced words in the generators
$a_p$ and their inverses, but in the case that we took the quotient
by $a_p^k$, then in these words we use exponents on $a_p$ in the interval
$(-k/2,k/2]$. (The reason for using this exponent format is that if we were to use only the positive powers of the finite-order generators, then we wouldn't be able to compute inverses in $G$, since we cannot compute whether $a_p$ has finite order or not.) First of all, we can computably recognize whether a
word in the generators fits this description, simply by checking
whether it is reduced and whether any of the exponents is too
large. The point of this last issue is that we can tell if the
exponent $a_p^r$ is too large by checking if program $p$ halts in
$2r$ steps or not. Similarly, we can easily compute the inverse of
a word from the word, and we can computably multiply words. Again,
whenever we have a word with some new exponents on the generators,
we need to check whether they reduce because of our quotient, and
this is possible by running the relevant computation for sufficient
number to steps to determine it.
Thus, we have a computable representation of the group $G$.
Finally, I claim that it is universal in the sense you requested.
Given any Turing machine program $p$, let $x_p=a_p$ and let
$y_p=a_q$ for some other program $q$ known not to halt. Thus, by
design, the group generated by $x_p,y_p$ will be the free group on
these generators if and only if $p$ does not halt.
An essentially equivalent presentation of the group can be made without reference to Turing machines or computations, but only to Diophantine equations, simply by using the Diophantine representation of the universal Turing machine. That is, since every c.e. set is the solution set of a Diophantine equation, there is a fixed Diophantine equation $d(y,\vec x)=0$, such that Turing machine program $p$ halts on trivial input if and only if $d(p,\vec x)=0$ has a solution in the integers, viewing the program as its Gödel code. So we may define the group $G$ as above, with infinitely many generators $a_n$, but taking the quotient by $a_n^k$, if $k$ is the size of the smallest integer solution of $d(n,\vec x)=0$. I'm not sure this makes the group "natural," (and my opinion is that this word has no robust, coherent mathematical meaning), but it does omit any mention of Turing machines, using instead a fixed Diophantine equation.
Lastly, let me observe that my group is not finitely generated, and
it may be interesting to have a finitely generated example, or even
a finitely presented example. I suspect that one can apply one of the embedding theorems to place this example into a finitely generated or even finitely presented group.
Best Answer
As far as I know, there is no such proof in either direction.
A proof of computational universality, like you said, would be to show that rule 30 can simulate computation (Turing machine or equivalent), and it would require extreme patience in experimenting with the cellular automaton as well as some creativity.
Proving the opposite would be more problematic, because there is no clear (generally accepted) definition what we consider as a computationally universal cellular automaton. The problem is a cellular automaton configuration can carry an infinite amount of information. If you allow too sophisticated encoding/decoding procedure for input/output of the computation, one might be able to perform all or a significant part of the computation by encoding/decoding and not the cellular automaton itself. If you restrict the form of encoding/decoding too much, you would risk losing some interesting examples. (For example, computation with rule 110 requires preparing a non-uniform periodic (infinite) configuration as the background of your input.)
Some references:
The difficulty is discussed in
and mentioned briefly in
General notions of computational universality in cellular automata and symbolic dynamical systems are discussed in
J.-C. Delvenne, P. Kůrka, V. Blondel, Decidability and Universality in Symbolic Dynamical Systems, Fundamenta Informaticae, 74, 2006, and
P. Di Lena, L. Margara, Computational complexity of dynamical systems: The case of cellular automata, Information and Computation, 206(9-10), 2008.