What do you mean by your question:
doesn't each nucleus take an infinite amount of time to decay?
As far as I know, this is not true. A nucleus will start in one state, and end in another "decayed state" + radiation ($\alpha^{2+}, \beta^\pm, \gamma$ or whatever), and this is not an infinitely long process.
A nucleus has a probability of decaying within the next time interval, say $\delta t$, or not. Thanks to how statistics and probability work, if we have a large number of these nuclei, they will collectively exhibit a "mean lifetime" (i.e. we are able to obtain an average time it takes for one nucleus to decay).
Perhaps you're getting confused by this formula:
$$N = N_0e^{-\lambda t} = N_0e^{-t/\tau}$$
where $N$ is the number of non-decayed nuclei present in your sample, and $N_0$ is the number of initial non-decayed nuclei.
In this case, yes it takes (in theory) an infinite amount of time for $N$ to reach $0$, though this assumes $N$ can vary continuously (such as taking values like $N=0.01$, which is non-physical - $N$ can only take integer values). As $N$ and $N_0$ get larger, this equation better describes the situation.
Here, $\tau = 1/\lambda$ is in fact the mean lifetime, and is related to the half life,$\tau_{1/2}$ via
$$\tau = \frac{\tau_{1/2}}{\ln 2}$$
(from http://hyperphysics.phy-astr.gsu.edu/hbase/Nuclear/meanlif.html)
Best Answer
Perhaps looking at the probability distribution function which characterises a radioactive decay process might help?
The basic premise when dealing with radioactive decay is that there is a constant parameter $\lambda$ locked away in a nucleus which dictates the decay of the nucleus.
You perhaps first meet it when it is stated that the rate at which nuclei decay $\dfrac {dN}{dt}$ is proportional to the number of undecayed nuclei $N$ which leads to the expression $\dfrac{dN}{dt} = - \lambda \,N$.
Another way of expressing this is to use a probability distribution function for the decay.
$F(t) = \lambda \, e^{-\lambda t}$
You observe one nucleus and start a clock ($t=0$).
The probability of the nucleus decaying within a time $t$ after starting the clock is $\displaystyle \int^t_0 \lambda \, e^{-\lambda t} \, dt$ ie the area under the probability distribution curve between time $0$ to time $t$.
Another way of stating this that after a time $t$ the probability of the nucleus decaying in the next interval of time $dt$ is $ \lambda \, e^{-\lambda t} \, dt$.
As an example if I want to know how long one has to wait $\tau$ for the probability of a decay to be $\frac 12$.
$\displaystyle \int^\tau_0 \lambda \, e^{-\lambda t} \, dt = \dfrac 12$
which produces the relationship between $\lambda$ and $\tau$
$\lambda\,\tau = \log_{\rm e} 2$ and $\tau$ is called the half life of the decay process.
Now note that $\displaystyle \int^\infty_0 \lambda \, e^{-\lambda t} \, dt=1$ which seems to say that if you wait for an infinite length of time the nucleus is certain to decay and the mean time for a decay is $\langle t \rangle = \displaystyle \int^\infty_0 t\, \lambda \, e^{-\lambda t} \, dt= \dfrac 1 \lambda$.
Put another way this is what you would get if you started with an infinite number of nuclei and watched them decay over an infinite length of time.
Now look at the evaluation if the time is $\frac {20}{\lambda}$ (about 14 half lives) then you get the probability of a decay in that time is $1-2\times 10^{-9}$ and the mean life is $\dfrac {1-4 \times 10^{-8}} {\lambda}$.
What this shows is that you are working out a weighted average and the weighting for large values of time are very small so much so as for them to be insignificant on any realistic time scale.