Define $\text{BB}(n)$ as the largest natural number whose Kolmogorov complexity (in a prefix-free binary language) is less than or equal to $n$ bits.
Consider $\text{BB}(n) \space \text{mod} \space 4^n$. This number has a Kolmogorov complexity less than $n + o(n)$, since it can be computed from $\text{BB}(n)$, and $K(\text{BB}(n)) \le n$.
Also consider $\lfloor \Omega \cdot 4^n \rfloor$ where $\Omega$ is Chaitin's constant. This number's Kolmogorov complexity is at least $2 \cdot n - o(n)$ bits (by definition of algorithmic randomness).
So,
$\text{BB}(n) \space \text{mod} \space 4^n \stackrel{?}{=} \lfloor \Omega \cdot 4^n \rfloor$
is computable since it is false for all but finitely many $n$.
Given the first $n$ bits of $\Omega$ it is possible to compute not just $\text{BB}(n)$ but all the $\text{BB}(i)$ for $i$ up to $n$. We can use this to turn the above statement sideways and say something about only the lower bits of each busy beaver number:
$K(\sum_{i \le n}{[4^i \cdot (\text{BB}(i) \space \text{mod} \space 4)]}) < n + o(n)$
implying that
$\sum_{i}{\frac{\text{BB}(i) \space \text{mod} \space 4}{4^i}}$
is not algorithmically random, and in particular,
$\Omega \ne
\sum_{i}{\frac{\text{BB}(i) \space \text{mod} \space 4}{4^i}}$
.
A couple more observations:
There is a total computable function $\text{CC}:\mathbb{N}\rightarrow\mathbb{N}$ that inverts $\text{BB}$, i.e. $\text{CC}(\text{BB}(n)) = n$ for all $n \in \mathbb{N}$. It works like this: on input $k$, run every TM with $k$ or fewer states for $k$ steps, and return the fewest number of states of any that halted on the last step. For all $k$ there is a $k$-state machine that terminates in exactly $k$ steps, so there will be a smallest one. This implies immediately that Busy Beaver numbers have some computable properties, for example if $f$ is any computable function, then there is another computable function $g$ such that $f(n) = g(\text{BB}(n))$, namely $g(k) = f(\text{CC}(k))$. But also, we can make $f$ and $g$ be the same function: $\text{CC}$ is non-increasing so it has no cycles and at least one fixed point, call the computable function that finds it $\text{CC}^*$. So, $\text{CC}^*(\text{BB}(n)) = \text{CC}^*(n)$. For $\text{CC}^*$ to be non-trivial there need to be at least two fixed points, surely there always are, but if not just redefine $\text{CC}(k) = k$ on some particular $k$ which is not a $\text{BB}$ number.
On the other extreme, I believe there exists a total computable function $g$ such that $\sum{\frac{g(\text{BB}(n))}{2^n}}$ is algorithmically random: $g(k)$ computes the $\text{CC}(k)^{\text{th}}$ bit of $\Omega$ using the assumption that $\text{BB}(\text{CC}(k)) = k$. I think it should work to to count all programs shorter than $\text{CC}(k)$ that terminate in at most $k$ steps (but more care is needed to describe this and prove that it is total).
In general, all of the "unprovability" results I've seen about the Busy-Beaver function seem to be relative to some particular formal system.
All unprovability results whatsoever are relative to some particular formal system: every sentence $\varphi$ is a theorem of the axiom system $\{\varphi\}$, after all! So if we interpret the term as strongly as you do per
such theorems are not "absolutely" unprovable since it is typically possible to construct more powerful formal systems in which these statements can be proven,
then there are simply no absolutely undecidable sentences at all. The best results we can hope for are principles of the form "For every theory of such-and-such type, only a small number of the sentences in this particularly simple set of sentences $\Gamma$ is decidable" - the Busy Beaver function provides such an example (every consistent recursively axiomatizable theory extending PA decides only finitely many of its values).
Incidentally, here's a bit of very annoying context: the phrase "absolutely undecidable" is used by some logicians, but in a more complicated (and highly informal) way. An absolutely undecidable sentence is one which, in some sense, we'll never find "intuitively compelling" - on par with those for the ZFC axioms, say - arguments for or against. (Personally I have a strong distaste for this term.)
Meanwhile, your counting argument
since there are an uncountable number of subsets of the integers ... [and] only countably many Turing machines, there must be theorems whose proof can't be generated by any Turing machine
implicitly assumes that we can in fact talk about every set of natural numbers in our language. But that's not true (as long as we're working in a countable language, like arithmetic or ZFC - and if we're working in an uncountable language, computability theory doesn't really apply): there are only countably many formulas in our language in the first place. So really it's not that we have too many true statements to admit proofs, it's that we have too many objects to formulate true statements about in the first place! (And this shouldn't be surprising: a proof is a sequence of sentences, so how could there be fewer proofs than sentences?)
Best Answer
This is clearly in the arithmetic hierarchy (and, in fact, relatively low in that hierarchy), so I looked up Turing's thesis, which is readily available online, to see what he meant.
It turns out that the definition Turing gives for number-theoretic theorem (or problem) may have been a bit idiosyncratic even in 1938; he includes a footnote justifying his use of this terminology. Today people would usually interpret "number-theoretic" to mean "arithmetical", but that's not how Turing used the phrase.
Turing defines, at the beginning of section 3:
In other words, this is a theorem of the form
$$(\forall x)(\exists y) \; (y\gt x\;\wedge\;\theta(y) = 0)$$
where $\theta$ is primitive recursive. (The quantifiers here range over natural numbers.)
Turing then continues by saying:
In modern terms, what Turing calls a number-theoretic theorem is a theorem that is $\Pi^0_2,$ as you can see from the format of the $\forall\exists$ formula above.
What Turing calls a number-theoretic problem is something that is Turing reducible to a $\Pi^0_2$ set, or, equivalently, that has Turing degree at most $\boldsymbol{0''}.$ [I'm assuming here that when Turing talks about being "transformed (by a uniform process)", he means what we now call Turing reducibility.]