Dear Roy, (3) is correct. More precisely, retrodictions have to follow completely different rules than predictions. This elementary asymmetry - representing nothing else than the ordinary "logical arrow of time" (the past is not equivalent to the future as far as the logical reasoning goes) - is confusing for a surprisingly high number of people including physicists.

However, this asymmetry between predictions and retrodictions has nothing to do with quantum mechanics per se. In classical statistical physics, one faces the very same basic problem. The asymmetry is relevant whenever there is any incomplete information in the system. The asymmetry occurs because "forgetting is an irreversible process". Equivalently, the assumptions (=past) and their logical consequences (=future) don't play a symmetric role in mathematical logic. This source of logical asymmetry is completely independent from the CPT-theorem that may guarantee a time-reversal symmetry of the fundamental laws of physics. But whenever there is anything uncertain about the initial or the final state, logic has to be used and logic has an extra asymmetry between the past and the future.

**Predictions: objective numbers**

In quantum mechanics, the probability of a future outcome is calculated from $|c|^2$ where $c$ is a complex probability amplitude calculated by evolving the initial wave function via SchrÃ¶dinger's equation, or by an equivalent method. The probabilities for the future are completely "objective". One may repeat the same experiment with the same initial conditions many times and literally *measure* the right probability. And this measurable probability is calculable from the theory - quantum mechanics, in this case - too.

**Retrodictions: subjective choices**

However, the retrodictions are always exercises in logical inference and logical inference - and I mean Bayesian inference in particular - always depends on priors and subjective choices. There is no theoretical way to calculate "unique" probabilities of initial states from the knowledge of the final state. Also, there is no experimental procedure that would allow us to measure such retrodictions because we are not able to prepare systems in the same "final states": final states, by definition, are always prepared by the natural evolution rather than by "us". So one can't measure such retrodictions.

To estimate the retrodicted probabilities theoretically, one must choose competing hypotheses $H_i$ - in the case of retrodictions, they are hypotheses about the initial states. We must decide about their prior probabilities $P(H_i)$ and then we may apply the logical inference. The posterior probability of $H_i$ is this conditional probability:
$$ P (H_i|F) = P(F|H_i) P(H_i) / P(F) $$
This is Bayes' formula.

Here, we have observed some fact $F$ about the final state (which may be, hypothetically, a full knowledge of the final microstate although it's unlikely). To know how this fact influences the probabilities of various initial states, we must calculate the conditional probability $P(F|H_i)$ that the property of the final state $F$ is satisfied for the initial state (assumption or condition) $H_i$. However, this conditional probability is not the same thing as $P(H_i|F)$: they are related by the Bayes formula above where $P(H_i)$ is our prior probability of the initial state $P(H_i)$ - our conclusions about the retrodictions will always depend on such priors - and $P(F)$ is a normalization factor ("marginal probability of $F$") that guarantees that $\sum_i P(H_i|F) = 1$.

**Second law of thermodynamics**

The logical asymmetry between predictions and retrodictions becomes arbitrarily huge quantitatively when we discuss the increase of entropy. Imagine that we organize microstates in ensembles - both for initial and final states; and this discussion works for classical as well as quantum physics. What do we mean by the probability that the initial state $I$ evolves to the final state $F$ if both symbols represent ensembles of microstates? Well, we must sum over all microstates in the final state $F$, but average over all microstates in the initial state $I$. Note that there is a big asymmetry in the treatment of the initial and final states - and it's completely logical that this asymmetry has to be present:
$$ P ( F|I) = \sum_{i,j} P(F_j|I_i) P(I_i) $$
We sum over the final microstates because $P(F_1 {\rm or } F_2) = P(F_1)+P(F_2)$; "or" means to add probabilities. However, we must average over the initial states because we must keep the total probability of all mutually excluding initial states equal to one.

Note that $P(I_i)$ is the prior probability of the $i$th microstate. In normal circumstances, when all the initial states are considered equally likely - which doesn't have to be so - $P(I_i) = 1/N_{I}$ for each $i$ where $N_{I}$ is the number of the initial states in the ensemble $I$ (this number is independent of the index $i$).

So the formula for $P(F|I)$ is effectively
$$ P ( F|I) = \frac{1}{N_{I}} \sum_{i,j} P(F_j|I_i) $$
Note that we only divide by the number of initial microstates but not the final microstates. And the number of the initial states may be written as $\exp(S_I)$, the exponentiated entropy of the initial state. Its appearance in the formula above - and the absence of $\exp(S_F)$ in the denominator - is the very reason why the lower-entropy states are favored as initial states but higher-entropy states are favored as final states.

On the contrary, if we studied the opposite evolution - and just to be precise, we will CPT-conjugate both initial and final state, to map them to $I', F'$ - the probability of the opposite evolution will be
$$ P(I'|F') = \frac{1}{N_{F}} \sum_{i,j} P(I'_i|F'_j). $$
Now, the probability $P(I'_i|F'_j)$ may be equal to $P(F_j|I_i)$ by the CPT-theorem: they're calculated from complex amplitudes that are equal (up to the complex conjugation). But this identity only works for the individual microstates. If you have ensembles of many microstates, they're treated totally differently. In particular, the following ratio is not one:
$$ \frac {P(I'|F')}{P(F|I)} = \exp(S_I-S_F) $$
I wrote the numbers of microstates as the exponentiated entropy. So the evolution from $F'$ to $I'$ isn't equally likely as the evolution from $I$ to $F$: instead, they differ by the multiplicative factor of the exponential of the entropy difference - which may be really, really huge because $S$ is of order $10^{26}$ for macroscopic objects. This entropy gets exponentiated once again to get the probability ratio!

This point is just to emphasize the people who claim that the evolution from a high-entropy initial state to a low-entropy final state is "equally likely" as the standard evolution from a low-entropy initial state to a high-entropy final state are making a mistake of a missing or incorrectly added factor of $\exp(10^{26})$ in their formulae, and it is a huge mistake, indeed. Also, there is absolutely no doubt that the inverse processes have these vastly different probabilities - and I would like to claim that I have offered the dear reader a full proof in the text above.

Their mistake may also be phrased as the incorrect assumption that conditional probabilities $P(A|B)$ and $P(B|A)$ are the same thing: their mistake is *this* elementary, indeed. These two conditional probabilities are not the same thing and the validity of the CPT-theorem in a physical theory can't change the fact that these two conditional probabilities are still very different numbers, regardless of the propositions hiding behind the symbols $A,B$.

Just to emphasize how shocking it is for me to see that those elementary issues about the distinction of past and future are so impenetrable for so many people in 2011, watch Richard Feynman's The Messenger Lecture number 5, "The Distinction of Past and Future" (Internet Explorer needed):

http://research.microsoft.com/apps/tools/tuva/index.html

The very first sentence - the introduction to this very topic is - "It's obvious to everybody that the phenomena in the world are self-evidently irreversible." Feynman proceeds to explain how the second law of thermodynamics and other aspects of the irreversibility follow even from the T-symmetric dynamical laws because of simple rules of mathematical logic. So whoever doesn't understand that the past and future play different roles in physics really misunderstands the first sentence in this whole topic - and in some proper sense, even the very title of it ("The Distinction of Past and Future").

Nowadays there exists a more fundamental geometrical interpretation of anomalies which I think can resolve some of your questions. The basic source of anomalies is that classically and quantum-mechanically we are working with realizations and representations of the symmetry group, i.e., given a group of symmetries through a standard realization on some space we need to lift the action to the adequate geometrical objects we work with in classical and quantum theory and sometimes, this action cannot be lifted. Mathematically, this is called an obstruction to the action lifting, which is the origin of anomalies. The obstructions often lead to the possibility to the realization not of the group of symmetries itself but some extension of it by another group acting naturally on the geometrical objects defining the theory.

There are three levels of realization of a group of symmetries:

The abstract level: for example the action of the Lorentz (Galilean) group on a Minkowski (Eucledian) space. This representation, for example is not unitary, and it is not the representation we work with in quantum mechanics.

The classical level: When the group action is realized in terms of functions belonging to the Poisson algebra of some phase space. For example, the realization of the Galilean or the Lorentz groups on the phase space of a classical free particle.

The quantum level when the group action realized in terms of a linear representations of operators on some Hilbert space (or just operators belonging to some $C^*$ algebra. For example, the realization of the Galilean or the Lorentz groups on a quantum Hilbert space of a free particle.

Now, passing from the abstract level to either the classical or the
quantum level may be accompanied with an obstruction. These obstructions exist in already quantum and classical mechanics with finite number of degrees of freedom, and not only in quantum field theories. Two very known examples are the Galilean group which cannot be realized on the Poisson algebra of the phase space of the free particle, rather, a central extension of which with a modified commutation relation:

$$[K_i, P_j] =-i \delta_{ij}m$$

, is realized. ($K_i$ are boosts and $P_i$ are translations $m$ is the mass). This extension was discovered by Bargmann, and sometimes it is called the Bargmann group. A second example, is the realization of spin systems in terms of sections of homogeneous line bundles over the two sphere $S^2$. Now, the action of the isometry group $SO(3)$ cannot be lifted to line bundles corresponding to half integer spins, rather a $\mathbb{Z}_2$ extension of which, namely $SU(2)$ can be lifted. In this case the extended group is semisimple and the issue that $SU(2)$ being a group extension of $SO(3)$ and not just a universal cover is not usually emphasized in physics texts.

The group extensions realized as a consequence of these obstructions may require:

1) Ray representations of the original group which are true representations of the extended group. This is the case of $SO(3)$, where the half integer spins can be realized through ray epresentations of SO(3), which are true representations of $SU(2)$. In this
the Lie algebras of both groups are isomorphic.

2) Group extensions corresponding Lie algebra extensions. This is the more general case corresponding for example to the Galilean case.

Now, in the quantum level, one can easier understand, why the
obstructions lead to group extensions. This is because, we are looking
for representations satisfying two additional conditions:

1) Unitarity

2) Positive energy

Sometimes (up to $1+1$ dimensions), we can satisfy these conditions merely by normal ordering, which results central extensions of the symmetry groups. This method apply to the case of the Virasoro and the Kac-Moody algebras which are central extensions to the Witt and loop algebras respectively, and can be obtained in the quantum level after normal ordering.

The relation between normal ordering and anomalies can be explained in
that the quantization operators are needed to be Toeplitz operators. A very known example is the realization of the harmonic oscillator on the Bargmann space of analytic functions, then the Toeplitz operators are exactly those operators where all derivatives are moved to the right. This is called the Wick quantization and it exactly corresponds to normal ordering in the algebraic representation. The main property of Toeplitz operators is that their composition is performed through star products, and star products of Toeplitz operators are are also Toeplitz operators thus the algebra of quantum operators is closed, but it is not closed to the original group but rather to a central extension of which. This important interpretation hasn't been extended to field theories yet.

It is worthwhile to mention that central extensions are not the most
general extensions one can obtain when a symmetry is realized in terms of operators in quantum theory, there are Abelian and even non-Abelian
extensions. One of the more known extensions of this type is the
Mickelsson-Faddeev extension of the algebra of chiral fermion non-Abelian charge densities when coupled to an external Yang-Mills field in $3+1$ dimensions:

$$[T_{a}(x), T_{b}(y)] = if_{ab}^c T_c(x) \delta^{(3)}x-y) +id_{ab}^c\epsilon_{ijk} \partial_i\delta^{(3)}(x-y) \partial_j A_{ck}$$

This extension is an Abelian noncentral extension.

The explanation of the existence "anomalies" in the classical case, i.e., on the Poisson algebra can be understood already in the case of the simplest symplectic manifold $\mathbb{R}^2$, the Poisson algebra is not isomorphic the translations algebra. A deeper analysis for example given in: Marsden and Ratiu page 408 for the case of the Galilean group. They showed that on the free particle Hilbert space, the Galilean group lifts to a central extension (the Bargmann group) which acts unitarily on the free particle Hilbert space: $\mathcal{H} = L^2(\mathbb{R}^3)$. Now, the projective Hilbert space $\mathcal{PH}$ is a symplectic manifold (as any complex projective space) in which the particle's phase space is embedded. The restriction of the representation to the projective Hilbert space and then to the particle's phase space retains the central extension i.e., is isomorphic to the extended group, thus the extended group acts on the Poisson algebra.

As a matter of fact one should expect always that the anomaly should be realized classically on the phase space. The case of fermionic chiral anomalies seems singular, because it is customary to say that the anomaly is existent only at the quantum level. The reason is that the space of Grassmann variables is not really a phase space, and even in the case of fermions, the anomaly exists in the classical level when one represents them in terms of "Bosonic coordinates". These anomalies are given as Wess-Zumino-Witten terms. (Of course these representations are not useful in Perturbation theory).

Another reasoning why anomalies exist always on the classical (phase space) level is that in geometric quantization, anomalies can be obtained on the level of prequantization. Now, prequantization does not require any more data than the phase space (not like the quantization itself which requires a polarization).

Now, trying to respond on your specific questions. It is true that chiral anomalies were discovered in quantum field theories when no ultraviolet regulators respecting the chiral symmetry could be found. But anomaly is actually an infrared property of the theory. The signs for that is the Adler-Bardeen theorem that no higher loop (than one) correction to the axial anomaly is present and more importantly only massless particles contribute to the anomaly. In the operator approach that I tried to adopt in this answer the anomaly is a consequence of a deformation that should be performed on the symmetry generators in order to be well defined on the physical Hilbert space and not a direct consequence of regularization.

Secondly, the anomaly exists in equally on both levels quantum and classical (on the phase space). The case of fermions and regularization was addressed separately.

Update - Elaboration of the spin case:

Here is the elaboration of the $SO(3)$, $SU(2)$ case which contains all the
ingredients regarding the obstruction to lifting and group extensions,
except that it does not have a corresponding Lie algebra extension.

We work on $S^2$ using the stereographic projection coordinate given in terms of the polar coordinates by:

$$z = \tan \frac{\theta}{2} e^{i \phi}$$

An element of the group $SU(2)$

$$g=\begin{pmatrix}
\alpha& \beta\\
-\bar{\beta} & \bar{\alpha }
\end{pmatrix}$$

acts on $S^2$ according to the MÃ¶bius transformation:

$$ z \rightarrow z^g = \frac{\alpha z + \beta}{-\bar{\beta} z + \bar{\alpha } }$$

However, one observes that the action of the special element:

$$g_0=\begin{pmatrix}
-1& 0\\
0 & -1
\end{pmatrix}$$

is identical to the action of the identity. This element is an SU(2)
element that projects to the unity of SO(3) (This can be seen from its
three dimensional representation which is the unit matrix). Thus the
group which acts nontrivially on $S^2$ is $SO(3)$

Now quantum mechanically spin systems can be realized on the sphere in
Hilbert spaces of analytic functions:

$$ (\psi, \xi) = \int_{S^2} \overline{\xi(z)} \psi(z) \frac{dzd\bar{z}}{(1+\bar{z}z)^2}$$

Transforming under $SU(2)$ according to:

$$ \psi(z) \rightarrow \psi^g(z) = (-\bar{\beta} z + \bar{\alpha })^{2j} \psi(z^{g^{-1}})$$

This is a ray representation of $SO(3)$ as $SO(3)$ does not have half integer representations.

Now, the first observation (the quantum level) is that the special element does not act
on the wave functions as the unit operator, for half integer spins it
adds a phase of $\pi$. This is what is meant that the $SO(3)$ action
cannot be lifted to the quantum Hilbert space.

Now turning to the the classical level. The symplectic form on $S^2$ is proportional to its area
element. The proportionality constant has to be an integer in a prequantizable theory (Dirac quantization condition)

$$\omega = 2j \frac{dz \wedge d\bar{z}}{(1+\bar{z}z)^2}$$

The corresponding Poisson bracket between two functions on the sphere:

$$\{f, h\} =\frac{1}{2j} (1+\bar{z}z)(\partial_z f \partial_{\bar{z}} h - \partial_z h \partial_{\bar{z}} f)$$

The function generating the group action in the Poisson algebra is given
by:

$$f_g= \left(\frac{\alpha \bar{z}z + \beta \bar{z} - \bar{\beta}z + \bar{\alpha}}{1+\bar{z}z}\right)^{2j}$$

Now, the function representing the unity of SU(2) in the function $f=1$,
while the function representing the special element is $f=-1$ for half integer spins, which is
a different function (It has to be a constant because it belongs to the
center of $SU(2)$, thus it has to Poisson commute with all functions.

Thus even at the classical level, the action of $SO(3)$ does not lift to
the Poisson algebra.

Now, regarding the question of classically distinguishing $SU(2)$ of $SO(3)$. If you compute the classical partition function of a spin
$\frac{1}{2}$ gas interacting with a magnetic field, it will be different than say spin $1$, but spin $\frac{1}{2}$ exists in the first place only if $SU(2)$ acts because $SO(3)$ allows only integer spins.

## Best Answer

Quantum mechanics and quantum mechanical theories are totally independent of the classical ones. The classical theories may appear and often do appear as limits of the quantum theories. This is the case of all "textbook" theories - because the classical limit was known before the full quantum theory, and the quantum theory was actually "guessed" by adding the hats to the classical one. In a category of cases, the full quantum theory may be "reverse engineered" from the classical limit.

However, one must realize that this situation is just an artifact of the history of physics on Earth and it is not generally true. There are classical theories that can't be quantized - e.g. field theories with gauge anomalies - and there are quantum theories that have no classical limits - e.g. the six-dimensional $(2,0)$ superconformal field theory in the unbroken phase. Moreover, it's typical that the quantum versions of classical theories lead to new ordering ambiguities (the identity of all $O(\hbar^k)$ terms in the Hamiltonian is undetermined by the classical limit in which all choices of this form vanish, anyway), divergences, and new parameters and renormalization of them that have to be applied.

Also, the predictions of quantum mechanics don't need any classical crutches. Quantum mechanics works independently of its classical limits, and the classical behavior may be deduced from quantum mechanics and nothing else in the required limit. Historically, people discussed quantum mechanics as a tool to describe the microscopic world only, assuming that the large objects followed the classical logic. The Copenhagen folks divided the world in these two subworlds, in an ad hoc way, and that simplified their reasoning because they didn't need to study quantum physics of the macroscopic measurement devices etc.

But these days, we fully understand the actual physical mechanism - decoherence - that is responsible for the emergence of the classical logic in the right limits. Because of decoherence, which is a mechanism that only depends on the rules of quantum mechqnics, we know that quantum mechanics applies to small as well as large objects, to all objects in the world, and the classical behavior is an approximate consequence, an emergent law.

To know the evolution in time, one needs to know the Hamiltonian - or something equivalent that determines the dynamics. The previous sentence is true both in classical physics and quantum mechanics, for similar reasons, but independently. If a classical theory is a limit of a quantum theory, it of course also means that its classical Hamiltonian may be derived as a limit of the quantum Hamiltonian. Of course, if you don't know the Hamiltonian operator, you won't be able to determine the dynamics and evolution with time. Guessing the quantum Hamiltonian from its classical limit is one frequent, but in no way "universally inevitable", way to find a quantum Hamiltonian of a quantum theory.