I think your question is not as precise as you portray it.
First, let me point out that you have not actually defined
a function $z$, in the sense of giving a first order
definition of it in set theory, and you provably cannot do
so, because of Tarski's theorem on the non-definability of
truth. We simply have no way to express the relation x is
definable in the usual first-order language of set theory.
More specifically:
Theorem. If ZFC is consistent, then there are models
of ZFC in which the collection of definable natural numbers
is not a set or even a class.
Proof. If V is a model of ZFC, then let $M$ be an internal
ultrapower of $V$ by a nonprincipal ultrafilter on
$\omega$. Thus, the natural numbers of $M$ are nonstandard
relative to $V$. The definable elements of $M$ are all
contained within the range of the ultrapower map, which in
the natural numbers is a bounded part of the natural
numbers of $M$. Thus, $M$ cannot have this collection of
objects as a set or class, since it would reveal to $M$
that its natural numbers are ill-founded, contradicting
that $M$ satisfies ZFC. QED
In such a model, your definition of $z$ is not first order.
It could make sense to treat your function $z$, however, in
a model as an externally defined function, defined outside
the model (as via second-order logic). In this case, $z(n)$
only involves standard or meta-theoretic definitions, and
other problems arise.
Theorem. If ZFC is consistent, then there is a model
of ZFC in which $z(n)$ is bounded by a constant function.
Proof. If ZFC is consistent, then so is $ZFC+\neg
Con(ZFC)$. Let $V$ be any model of this theory, so that
there are no models of ZFC there, and the second part of
the definition of $z$ becomes vacuous, so it reduces to its
definable-in-$V$ first part. Let $M$ be an internal
ultrapower of $V$ by an ultrafilter on $\omega$. Thus, $M$
is nonstandard relative to $V$. But the function $z$,
defined externally, uses only standard definitions, and the
definable elements of $M$ all lie in the range of the
ultrapower map. If $N$ is any $V$-nonstandard element of
$M$, then every definable element of $M$ is below $N$, and
so $z(n)\lt N$ for every $n$ in $M$. QED
Theorem. If ZFC is consistent, then there is a model
of ZFC in which $f(n)\lt z(10000)$ for every natural number
n in the meta-theory.
Proof. If ZFC is consistent, then so is $ZFC+\neg
Con(ZFC)+GCH$. Let $V$ be a countable model of $ZFC+\neg
Con(ZFC)+GCH$. Since $V$ has no models of ZFC, again the
second part of your definition is vacuous, and it reduces
just to the definability-in-$V$ part. Let $M$ again be an
internal ultrapower of $V$ by an ultrafilter on $\omega$,
and let $N$ be a $V$-nonstandard natural number of $M$.
Every definable element of $M$ is in the range of the
ultrapower map, and therefore below $N$. In particular, for
every meta-theoretic natural number $n$, we have $f(n)\lt
N$ in $M$, since $f(n)$ is definable. Now, let $M[G]$ be a
forcing extension in which the continuum has size
$\aleph_N^M$. Thus, $N$ is definable in $M[G]$ by a
relatively short formula; let's say 10000 symbols (but I
didn't count). Since the forcing does not affect the
existence of ZFC models or Turing computations between $M$
and $M[G]$, it follows that $f(n)\lt z(10000)$ in $M[G]$
for any natural number of $V$. QED
Theorem. If ZFC is consistent, then there is a model
of ZFC with a natural number constant $c$ in which $z(n)\lt
f(c)$ for all meta-theoretic natural numbers $n$.
Proof. Use the model $M$ (or $M[G]$) as above. This time,
let $c$ be any $V$-nonstandard natural number of $M$. Since
the definable elements of $M$ all lie in the range of the
ultrapower map, it follows that every z(n), for
meta-theoretic $n$, is included in the $V$-standard
elements of $M$, which are all less than $c$. But $M$
easily has $c\leq f(c)$, and so $z(n)\lt f(c)$ for all
these $n$. QED
The probabilistic reasoning depends on a conglomerability assumption, namely that given a fixed sequence $\vec u$, the probability of guessing correctly is $(n-1)/n$, then for a randomly selected sequence, the probability of guessing correctly is $(n-1)/n$. But we have no reason to think the event of guessing correctly is measurable with respect to the probability measure induced by the random choice of sequence and index $i$, and we have no reason to think that the conglomerability assumption is appropriate.
A quick way to see that the conglomerability assumption is going to be dubious is to consider the analogy of the Brown-Freiling argument against the Continuum Hypothesis (see here for a discussion). Assume CH. Let $\prec$ be a well-order of $[0,1]$. Suppose $X$ and $Y$ are i.i.d. uniformly distributed on $[0,1]$. Consider the question of which variable is bigger. Fix a value $y\in [0,1]$ for $Y$. Then $P(X\preceq y)=0$, since there are only countably many points $\prec$-prior to $y$. By a conglomerability assumption, we could then conclude that $P(X\le Y)=0$, which would be absurd as the same reasoning would also show that $P(Y\le X)=0$. The argument fallaciously assumes conglomerability. We are neither justified in concluding that $P(X\le Y)=0$, nor that $\{ X \le Y \}$ is measurable (though for each fixed $y$, $\{ X \le y \}$ is measurable). And indeed it's not measurable: for were it measurable, we could use Fubini to conclude that it has null probability. Note that one can repeat the argument without CH but instead using an extension of Lebesgue measure that assigns null probability to every subset of cardinality $<\mathfrak c$, so clearly there is no refutation of CH here.
Here's another analogy: By the Hausdorff Paradox, decompose $S^2-D$ for a countable $D$ into disjoint $A_1,A_2,A_3$, where $A_1,A_2,A_3,A_1\cup A_2$ are all isometrically equivalent. Suppose a point $X$ is uniformly chosen in $S^2$. (I will ignore $D$ from now on, since it has zero measure. If you like, you can assume that the uniform choice is done so $X$ cannot lie in $D$.) Let $i$ be a random index in $\{1,2,3\}$, independent of $X$. How likely is it that the $X \in A_i$? It's very tempting to say that it's got to be $1/3$. Any fixed value of $X$ in $S^2$ is in one of the three subsets, after all, and so if we choose a random index $i$, surely we have probability $1/3$ that it'll be in $A_i$. Of course this easily leads to paradox. But we are not in fact entitled to assume that $J = \{ \omega : X(\omega) \in A_{i(\omega)} \}$ is measurable. (In fact, it's easy to see that it's not measurable.)
Let's go back to the riddle. Suppose $\vec u$ is chosen randomly. The most natural option is that it is a nontrivial i.i.d. sequence $(u_k)$, independent of the random index $i$ which is uniformly distributed over $[100]=\{0,...,99\}$. In general, $M_j$ will be nonmeasurable (one can prove this in at least some cases). We likewise have no reason to think that $M$ is measurable. But without measurability, we can't make sense of talk of the probability that the guess will be correct.
Here's an amusing thing that may help see how measurability enters into these things. Consider a single sequence of infinitely many independent fair coin flips. Our state space is $\Omega=\{0,1\}^{\mathbb N}$, corresponding to an infinite sequence $(X_i)_{i=0}^\infty$ of i.i.d.r.v.s with $P(X_i=1)=P(X_i=0)=1/2$. Start with $P$ being the completion of the natural product measure on $\Omega$.
Can you guess the first coin flip on the basis of all the others? You might think: "Of course not! No matter what function from the values of flips $X_1,X_2,...$ to $\{0,1\}$ is chosen, the probability that the value of the function equals $X_0$ is going to be $1/2$."
That's a fine argument assuming the function is measurable. But what if it's not? Here is a strategy: Check if $X_1,X_2,...$ fit with the relevant representative. If so, then guess according to the representative. If not, then guess $\pi$. (Yes, I realize that $\pi\notin\{0,1\}$.) Intuitively this seems a really dumb strategy. After all, we're surely unlikely to luck out and get $X_1,X_2,...$ to fit with the representative, and even if they do, the chance that $X_0$ will match it, given the rest of the sequence, seems to be only $1/2$.
But if you choose shift-invariant representatives (i.e., $r([\tau \vec u])=\tau r([\vec u])$ when $(\tau\vec u)_n = \omega$ is a left sequence shift--by Zorn, such a choice is possible), then the outer $P$-measure of the set of representatives is equal to $1$. So there is an extension $P'$ of $P$ such that $P'$-almost surely the dumb strategy works. Just let $P'$ be an extension on which the set of representatives has measure $1$ and note that the dumb strategy works on the set of representatives.
Best Answer
It is possible to have every mathematician guess the number in one of the boxes with at most one error.
Partition the natural numbers into countably many sets, $\{S_i\}_{i=0}^\infty$, where each $S_i=\{n_{i_1},n_{i_2},\dots,\}$ is countably infinite. (There are many ways to do this) Since we have countably many mathematicians, we may list them, and assign $S_i$ to the $i^{th}$ mathematician.
If $u_k$ denotes the real number in the $k^{th}$ box, then the $i^{th}$ mathematician will be assigned the sequence of real numbers $u_{n_{i_j}}$, for $j=1,2,3\dots$. Using the axiom of choice, we may chose a representative for each equivalence class of sequences of real numbers under the equivalence relation $\{u_n\}_{n=1}^\infty\equiv\{v_n\}_{n=1}^\infty$ if there exists $M>0$ such that $v_n=u_n$ for all $n>M$. Thus, for the $i^{th}$ mathematician there will exist an integer $M_i$ such that for all $j>M_i$, the sequence $u_{n_{i_j}}$ is equal to the representative of its equivalence class. The goal is to have mathematician $i$ guess an integer $H_i>M_i$ by looking at every box except those in the set $S_i$. If this happens, then mathematician $i$ may look at all of the elements of $u_{n_{i_j}}$ with $j\geq H_i+1$, determine the equivalence class, and guess the box with $j=H_i$. Since $H_i>M_i$, his guess will be correct. It follows that we need all but possibly one mathematician to guess an integer $H_i>M_i$. If the sequence $M_i$ is bounded, then the problem is easy. The difficulty is handling an unbounded sequence $M_i$.
Under the same system of representatives, the sequence $\{M_i\}_{i=0}^\infty$ lies in some equivalence class of real numbers. Since mathematician $i$ knows the value of $M_l$ for all $l\neq i$, each mathematician can determine the equivalence class of the sequence $\{M_i\}_{i=0}^\infty$. Let $\{v_i\}_{i=0}^\infty$ denote the representative of this equivalence class. Then there exists an integer $N$ such that for every $i>N$, $M_i=v_i$. Mathematician $i$ with $i\leq N$ can determine $N$, however each mathematician with $i>N$ only knows that $N\leq i$. The strategy for guessing is as follows: Assign to mathematician $i$ with $i>N$ the integer $$H_i=1+\max\{v_i,M_{i-1},M_{i-2},\dots,M_1,M_0\},$$ and to each mathematician with $i\leq N$, the integer $$H_i=1+\max\{M_{N},M_{N-1},\dots,M_{i+1},M_{i-1},\dots,M_1,M_0\}.$$ Then we must have $H_i>M_i$ for every $i$ except possibly one. Thus, we have set up a strategy which allows every mathematician except possibly one to guess some box correctly.