No such $T$ exists (at least, no such $T$ extending PA (or similar) exists). The key fact is that the recursion theorem is internal, in the following sense:
For each $e$, there is some $c$ such that PA proves the sentence "If $\Phi_e$ is total, then $\Phi_{\Phi_e(c)}\cong\Phi_c$."
In fact, the proof and the $c$ are just the usual ones - the point being that PA is strong enough to prove that the index constructed in the proof of the recursion theorem does what it should!
We now apply the recursion theorem, "internalized" as above, in the natural way. Let $T$ be a consistent computably axiomatized theory extending PA. By the internal recursion theorem, we can find an $e$ such that PA proves:
If $T$ proves $\Phi_e(e)\downarrow$ before it proves $\Phi_e(e)\uparrow$ (that is, via a proof which appears first in some standard ordering of proofs), then $\Phi_e(e)\uparrow$.
If $T$ proves $\Phi_e(e)\uparrow$ before it proves $\Phi_e(e)\downarrow$, then $\Phi_e(e)\downarrow$.
If $T$ proves neither, then $\Phi_e(e)\uparrow$.
(First think about why an $e$ with these properties exists classically, using the recursion theorem as normally stated.)
Since $T$ extends PA, $T$ also proves these facts. But then if $T$ solves the halting problem in your sense, it is inconsistent:
If $T$ proves $\Phi_e(e)\downarrow$ before it proves $\Phi_e(e)\uparrow$, then $T$ sees that, and so by the above $T$ also proves $\Phi_e(e)\uparrow$.
If $T$ proves $\Phi_e(e)\uparrow$ before it proves $\Phi_e(e)\downarrow$, then $T$ sees that, and so by the above $T$ also proves $\Phi_e(e)\downarrow$.
In fact, what we've really shown is that the sets of indices which PA proves halt and which PA proves don't halt are recursively inseparable!
This is actually a bit messy.
First, here's a sketch of the proof of the claim:
Fix an increasing total computable function $f$. There is a natural way to effectively produce, given $k\in\mathbb{N}$, a Turing machine $M_k$ which on a blank tape first computes $f(k)$ and then dithers around for that many steps and then halts. By examining this construction, we can see that the number of states of $M_k$ is bounded by $C_f+k$ for some constant $C_f$. This tells us that:
For every increasing total computable $f$ there is some constant $C_f$ such that $BB(C_f+n)\ge f(n)$ for all $n$.
Now we make the following definition:
For an increasing total computable function $f$, let $g_f: x\mapsto f(2x)$.
It's easy to see that since $BB(C_{g_f}+n)\ge g(n)$ for all $n$ and some constant $C_{g_f}$, we must have $BB$ eventually dominate $f$: $\color{red}{\mbox{if $m\ge C_{g_f}$}}$ then $$BB(C_{g_f}+m)\ge g(m)=f(2m)\color{red}{\ge} f(C_{g_f}+m),$$ and so $BB(x)\ge f(x)$ for all $x\ge C_{g_f}$.
Now, note that in the above we didn't just use general arguments about computability; we actually talked about building Turing machines (and bullsh!tted a bit - "by examining this construction, we can see" ...). It turns out there's a good reason for this: the statement is not true for "coarse" reasons. The rest of this answer discusses this situation.
Let me begin by considering a variant of the busy beaver, the "Workaholic Wombat:"
$WW(n)=\max\{t:$ for some Turing machine $M$ with $<n$ states and some $k<n$, $M$ halts after exactly $t$ steps on input $k\}.$
Note the new ingredient: we're allowing inputs as well as machines. WW and BB are Turing equivalent, of course, the crucial point being that the inputs allowed for $WW(n)$ are bounded by $n$. However, $WW(n)$ has much better behaved asymptotics:
For every computable total $f$, $WW$ dominates $f$.
Proof. Fix $f$ total computable. Let $M$ be the Turing machine which on input $k$ computes $f(k+1)$, dithers around for $f(k)$-many steps, and then halts. Suppose $M$ has $n$ states; then for each $m\ge n$ we have $WW(m+1)>f(m+1)$. $\quad\Box$
This is an example of a coarse argument: it doesn't depend on the fine details of exacly how we represent Turing machines. This argument holds for any "reasonable enumeration" (or "finite-to-one enumeration" - multiple Turing machines may have the same number of states, but there are only finitely many with a given number of states) of Turing machines. However, there are plenty of results which are more finicky. My favorite example is the Padding Lemma. This obviously-true fact turns out to be dependent on the way we list Turing machines:
There is an effective enumeration of partial computable functions such that every partial computable function occurs exactly once on the list.
Such an enumeration is called a Friedberg enumeration. In the statement above, we have to be very careful what we mean by "effective enumeration of partial computable functions," and thinking about this issue will eventually lead you to the notion of an "admissible numbering" which rules out this sort of nonsense. There are other sillinesses which effective enumerations of partial computable functions can display, and it can be fun to play around with them.
Now, any effective way of listing partial computable functions gives rise to a corresponding Busy Beaver function and a corresponding Workaholic Wombat function. As observed above, the WW will still dominate every total computable function; however, it's not hard to cook up listings whose BBs do not dominate every total computable function. The conclusion is:
Proving that BB dominates every total computable function is going to take some playing around with the precise details of Turing machines, and can't just be done via general "coarse" considerations.
Best Answer
Define $\text{BB}(n)$ as the largest natural number whose Kolmogorov complexity (in a prefix-free binary language) is less than or equal to $n$ bits.
Consider $\text{BB}(n) \space \text{mod} \space 4^n$. This number has a Kolmogorov complexity less than $n + o(n)$, since it can be computed from $\text{BB}(n)$, and $K(\text{BB}(n)) \le n$.
Also consider $\lfloor \Omega \cdot 4^n \rfloor$ where $\Omega$ is Chaitin's constant. This number's Kolmogorov complexity is at least $2 \cdot n - o(n)$ bits (by definition of algorithmic randomness).
So,
$\text{BB}(n) \space \text{mod} \space 4^n \stackrel{?}{=} \lfloor \Omega \cdot 4^n \rfloor$
is computable since it is false for all but finitely many $n$.
Given the first $n$ bits of $\Omega$ it is possible to compute not just $\text{BB}(n)$ but all the $\text{BB}(i)$ for $i$ up to $n$. We can use this to turn the above statement sideways and say something about only the lower bits of each busy beaver number:
$K(\sum_{i \le n}{[4^i \cdot (\text{BB}(i) \space \text{mod} \space 4)]}) < n + o(n)$
implying that
$\sum_{i}{\frac{\text{BB}(i) \space \text{mod} \space 4}{4^i}}$
is not algorithmically random, and in particular,
$\Omega \ne \sum_{i}{\frac{\text{BB}(i) \space \text{mod} \space 4}{4^i}}$ .
A couple more observations:
There is a total computable function $\text{CC}:\mathbb{N}\rightarrow\mathbb{N}$ that inverts $\text{BB}$, i.e. $\text{CC}(\text{BB}(n)) = n$ for all $n \in \mathbb{N}$. It works like this: on input $k$, run every TM with $k$ or fewer states for $k$ steps, and return the fewest number of states of any that halted on the last step. For all $k$ there is a $k$-state machine that terminates in exactly $k$ steps, so there will be a smallest one. This implies immediately that Busy Beaver numbers have some computable properties, for example if $f$ is any computable function, then there is another computable function $g$ such that $f(n) = g(\text{BB}(n))$, namely $g(k) = f(\text{CC}(k))$. But also, we can make $f$ and $g$ be the same function: $\text{CC}$ is non-increasing so it has no cycles and at least one fixed point, call the computable function that finds it $\text{CC}^*$. So, $\text{CC}^*(\text{BB}(n)) = \text{CC}^*(n)$. For $\text{CC}^*$ to be non-trivial there need to be at least two fixed points, surely there always are, but if not just redefine $\text{CC}(k) = k$ on some particular $k$ which is not a $\text{BB}$ number.
On the other extreme, I believe there exists a total computable function $g$ such that $\sum{\frac{g(\text{BB}(n))}{2^n}}$ is algorithmically random: $g(k)$ computes the $\text{CC}(k)^{\text{th}}$ bit of $\Omega$ using the assumption that $\text{BB}(\text{CC}(k)) = k$. I think it should work to to count all programs shorter than $\text{CC}(k)$ that terminate in at most $k$ steps (but more care is needed to describe this and prove that it is total).