There are many steps:
Step 1, select a state $\Psi$.
Step 2, prepare many systems in same state $\Psi$
Step 3, select two operators A and B
Step 4a, for some of the systems prepared in state $\Psi$, measure A
Step 4b, for some of the systems prepared in state $\Psi$, measure B
Now if you analyze the results, assuming strong (not weak) measurements then every time you measured A, you got an eigenvalue of A, and every time you measure B you got an eigenvalue of B. Each eigenvalue had a probability (which is equal to the ratio of the squared norm of the projection onto the eigenspace divided by the squared norm before you projected onto the eigenspace). So your eigenvalues of A come from a probability distribution that often has a mean $\langle A\rangle=\langle \Psi|A|\Psi\rangle $ and a standard deviation $\Delta A=\sqrt{\langle \Psi|\left(A^2-\langle \Psi|A|\Psi\rangle^2\right)|\Psi\rangle}$. And your eigenvalues of B come from a probability distribution that often has a mean $\langle B\rangle=\langle \Psi|B|\Psi\rangle $ and a standard deviation $\Delta B=\sqrt{\langle \Psi|\left(B^2-\langle \Psi|B|\Psi\rangle^2\right)|\Psi\rangle}$. You never get those from a measurement, or even from a whole bunch, but from steps 4a and 4b you do get a sample mean and a sample standard deviation, and for a large sample these are likely to be very close to the theoretical mean and the theoretical standard deviation.
The uncertainty principle says that way back in step 1 (when you selected $\Psi$) you could select a $\Psi$ that gives a small $\Delta A$, or a $\Psi$ that gives a small $\Delta B$ (in fact if $\Psi$ is an eigenstate of A then $\Delta A=0$, same for $B$). However, $$\Delta A \Delta B \geq \left|\frac{\langle AB-BA\rangle}{2i}\right|=\left|\frac{\langle\Psi| AB-BA |\Psi\rangle}{2i}\right|,$$
So in particular noncommuting operators often (i.e. if the expectation value of their commutator does not vanish) have a tradeoff, if the state in question has really low standard deviation for one operator, then the state in question must have a higher standard deviation for the other.
If the operators commute, not only is there no joint limit to how low the standard deviations can go, but measuring the other variable keeps you in the same eigenspace of the other operator. However that is a completely different fact since the uncertainty principle is about the standard deviations of two probability distributions for two observables applied to one and the same state, and thus approximately applies to the sample standard deviations generated from identically prepared states.
If you have a system prepared in state $\Psi$ and you measure A on it then you generally have to use a different system also prepared in $\Psi$ to measure B. That's because when you measure A on a system it projects the state onto an eigenspace of A, which generally changes the state. And since the probability distribution for B is based on the state, now that you have a different state you will have a different probability distribution for B. You can't find out $\Delta B=\sqrt{\langle \Psi|\left(B^2-\langle \Psi|B|\Psi\rangle^2\right)|\Psi\rangle}$ if you don't have $\Psi$ and only have $\Psi$ projected onto an eigenspace of A.
Best Answer
As WillO points out in the comments, the quote is actually wrong: non-commuting observables can have (some) simultaneous eigenstates--but a better statement would be that not all of their eigenstates are simultaneous. In other words, if observable $\hat{A}$ has an eigenstate that's a non-trivial superposition of eigenstates of observable $\hat{B}$, then they're non-commuting, and vice versa.
The position-space and momentum-space wavefunctions are Fourier transforms of one another. In position-space representation, a state of definite position $x'$ would be a Dirac delta, $\psi_{x'}(x) = \delta(x-x')$, and the and a state of definite momentum $p'$ would be $$\psi_{p'}(x) = \exp(ip'x/\hbar)/\sqrt{2\pi\hbar}\text{.}$$ It is therefore by default a superposition of position eigenstates.
In fact, when you're writing a wavefunction in position-space, you are automatically representing it as a superposition of position eigenstates, with the function value $\psi(x)$ as the coefficient of the eigenstate of definite position $x$ in that superposition. That's what a wavefunction means: for arbitrary state $|\psi\rangle$, the position-space wavefunction is $\psi(x) = \langle x|\psi\rangle$, where $|x\rangle$ is a position eigenstate.
A similar statement would be correct for momentum-space: momentum eigenstates would be Dirac deltas in momentum-space representation, etc. The only difficultly is that that states like $\psi_p'(x)$ is not normalizable--but that's not a big deal; it's completely dual to the problem of the Dirac delta not actually being a function. They still make sense for the things they're practically used for, as distributions or various other more mathematically sophisticated ways of dealing with that issue.
You seem to be interpreting 'quantum' as implying 'discrete'. This would be very mistaken, and observables in quantum mechanics can and often do have continuous spectra--i.e., a continuous region of allowed results of measurements (eigenvalues). Typically, quantum-mechanical position and momentum observables are continuous. The difference between classical mechanics is that the operators that represent those observables are non-commutative.
It's completely the opposite of this. If you have two observables $\hat{A}$ and $\hat{B}$, with eigenstates (states of definite values) of one that are all orthogonal to eigenstates of the other, then they don't have any non-trivial uncertainty relation between them.
One thing that might be helpful to realize is that the observables-as-operators formalism of quantum mechanics in Hilbert space applies just as well to classical mechanics, too--the difference is entirely in what operators correspond to which physical observables, including which ones you are allowed to measure.
For example, if you have an observable $\hat{z}$ with two eigenstates $|z_+\rangle$ and $|z_-\rangle$, an arbitrary superposition is always a valid state, e.g. the following two particular ones $$|x_\pm\rangle = \frac{1}{\sqrt{2}}\left(|z_+\rangle \pm |z_-\rangle\right)\text{.}$$ Now, in quantum mechanics, you could easily have an operator such as $$\hat{x} = |z_+\rangle\langle z_+| + |z_-\rangle\langle z_-|$$ that has exactly those two superpositions as its eigenstates. This operator would not commute with $\hat{z}$, and there will be a non-trivial uncertainty relationship between them. The difference is that in standard classical mechanics, something like $\hat{x}$ would not correspond to a valid physical observable. You would simply be forbidden from measuring it.
Instead, if you're only limited to measuring $\hat{z}$ and observables that commute with it, the states $|x_\pm\rangle$ would be indistinguishable from the mixed state of $|z_+\rangle$ and $|z_-\rangle$ with probability $1/2$ each. But quantum mechanics allows you more freedom.
An even easier example: if $|\psi\rangle$ is a non-trivial superposition of multiple eigenstates of some some observable $\hat{A}$, then the projector $\hat\pi = |\psi\rangle\langle\psi|$ is a valid observable in quantum mechanics that doesn't commute with $\hat{A}$ and with a single eigenstate of $|\psi\rangle$.