The general situation is the following one. There is a self-adjoint operator $H :D(H) \to \cal H$, with $D(H) \subset \cal H$ a dense linear subspace of the Hilbert space $\cal H$. (An elementary case is ${\cal H} = L^2(\mathbb R, dx)$, but what follows is valid in general for every complex Hilbert space $\cal H$ associated to a quantum physical system.)
It turns out that $D(H) = \cal H$ if and only if $H$ is bounded (it happens, in particular, when $\cal H$ is finite dimensional).
Physically speaking $H$ is bounded if and only if the values the corresponding observable (the energy of the system) attain form a bounded set, so it hardly happens in concrete physical cases. $D(H)$ is almost always a proper subset of $\cal H$.
If $\psi \in \cal H$ represents a (pure) state of the system, its time evolution is given by
$$\psi_t = e^{-i \frac{t}{\hbar} H}\psi \tag{1}\:.$$
The exponential is defined via spectral theorem. The map $\mathbb R \ni t \mapsto e^{-i \frac{t}{\hbar} H}\psi$ is always continuous referring to the topology of $\cal H$. Moreover it is also differentiable if and only if $\psi_t \in D(H)$ (it is equivalent to say that $\psi \in D(H)$). In this case one proves that (Stone theorem)
$$\frac{d\psi_t}{dt} = -i \frac{1}{\hbar} H e^{-i \frac{t}{\hbar} H}\psi= -\frac{i}{\hbar} H\psi_t\:.$$
In other words,
$$i \hbar \frac{d\psi_t}{dt} = H\psi_t\:.\tag{2}$$
It should be clear that $\frac{d}{dt}$ is not an operator on $\cal H$, as it acts on curves $\mathbb R \ni t \mapsto\psi_t$ instead of vectors.
$$\frac{d\psi_t}{dt} = \lim_{s \to 0} \frac{1}{s} \left(\psi_{t+s}-\psi_t \right)$$
and the limit is computed with respect to the Hilbert space norm.
Identity (2) holds if and only if $\psi \in D(H)$ and not in general.
ADDENDUM.
Identities or even definitions (!) like
$$H = i \hbar \frac{d}{dt}\:.\tag{3}$$
make no sense. An observable in QM first of all is an operator (self-adjoint) on the Hilbert space $\cal H$ of the theory. In other words it is a linear map $A$ associating any given vector $\psi \in \cal H$ (or some suitable domain) to another vector $A\psi$. If $\psi$ is a given single vector of $\cal H$ - and not a curve $t \mapsto \psi_t$ - the formal object $$\frac{d}{dt}\psi$$ has no meaning at all as it cannot be computed! Thus, wondering whether or not $H$, "defined" by means of (3), is Hermitian does not make sense in turn, because the RHS of (3) is not an operator in $\cal H$.
The concrete definition of $H$ can be given as soon as the physical system is known and taking advantage of some further physical principles like some supposed correspondence between classical observables and quantum ones, or group theoretical assumptions about the symmetries of the system.
For non-relativistic elementary systems described in $L^2(\mathbb R^3)$, the Hamiltonian operator has the form of the (hopefully unique) self-adjoint extension of the symmetric operator $$H := -\frac{\hbar^2}{2m}\Delta + V(\vec{x})$$
That is the definition of $H$.
Nevertheless, Schroedinger equation (2) is always valid, no matter the specific features of the quantum (even relativistic) system, when $\psi \in D(H)$. Time evolution is however always described by (1) regardless any domain problem.
A first remark: the term "Hermitian", even if very popular in physics is in my opinion quite misleading (because someone uses it for symmetric operators, others for self-adjoint ones).
A second remark: the self-adjoint operators of a given Hilbert space $\mathscr{H}$ are in one-to-one correspondence with the strongly continuous groups of unitary operators; not with any group of unitary operators. So it is not possible to associate observables with "unitary operators", but it is possible to associate them with strongly continuous (abelian, locally compact) groups of unitary operators.
These distinctions, even if in some sense subtle, may be important. In fact there are representations of unitary groups that does not admit a self-adjoint generator; for example the canonical commutation relations (in the exponentiated Weyl form) have such "non-regular" representations for fields, and are physically related to infrared problems (see e.g this link).
Concerning observables, the point is that it is quite difficult to give a satisfactory algebraic setting in order to collect together observables that are unbounded (as they actually are the majority of physically relevant quantities: e.g. energy, momentum...). One option is to construct an algebra of unbounded operators, but there are all kinds of domain "nightmares" to be taken into account. Another is to consider an algebra of bounded operators (a $C^*$ or von Neumann algebra), and "affiliate" unbounded self-adjoint operators to it in a suitable fashion. Both procedures are not, in my opinion, completely satisfactory; anyways the algebraic approach gives a very nice framework to understand some of the aspects of quantum theories, especially representations of groups of operators.
My personal point of view is to consider any self-adjoint operator on a given Hilbert space (usually a suitable representation of a $C^*$ algebra, or its bicommutant) as an observable. This choice is justified from the fact that any real-valued physically measurable quantity that is actually measured by physicists behaves like a self-adjoint operator (and not a symmetric one); in particular it has (mathematically speaking) an associated spectral family, as it is the case for self-adjoint operators but not for symmetric ones.
A last mathematical comment: Of course you can associate to a given self-adjoint operator $A$ a strongly continuous unitary group $e^{itA}$; and for example construct the $C^*$ algebra $\{e^{itA},t\in\mathbb{R}\}\overline{\phantom{ii}}$; where the bar stands for the closure (in the operator norm). That algebra may be very interesting to study, and be related to a certain symmetry group of transformations and so on. However, there are other algebras that could be even more interesting, for example the resolvent algebra $\{(A-i\lambda)^{-1}, \lambda\in\mathbb{R}\}\overline{\phantom{ii}}$.
In the case of CCR, the resolvent algebra has a "richer" structure of affiliated self-adjoint operators, and more importantly of automorphisms. That means that more types of quantum dynamics can be defined on the resolvent algebra, preserving it, than for the Weyl (unitary exponential) algebra. In the viewpoint of observables being only the operators affiliated to a given algebra, this means that the resolvent algebra contains more observables, and a less trivial structure of possible evolutions than the Weyl algebra.
Best Answer
I'm going to try to explain why and how density operators in quantum mechanics correspond to random variables in classical probability theory, something none of the other answers have even tried to do.
Let's work in a two-dimensional quantum space. We'll use standard physics bra-ket notation. A quantum state is a column vector in this space, and we'll represent a column vector as $\alpha|0\rangle + \beta |1 \rangle.$ A row vector is $\gamma \langle 0 | + \delta \langle 1 |\,$.
Now, you might think that a probability distribution is a measure on quantum states. You can think of it that way, but it turns out that this is too much information. For example, consider two probability distributions on quantum states. First, let's take the probability distribution
$$ \begin{array}{cc} |0\rangle & \mathrm{with\ probability\ }2/3,\\ |1\rangle & \mathrm{with\ probability\ }1/3. \end{array} $$
Next, let's take the probability distribution $$ \begin{array}{cc} \sqrt{{2}/{3}}\,\left|0\right\rangle +\sqrt{1/3}\, \left|1\right\rangle & \mathrm{with\ probability\ }1/2,\\ \sqrt{{2}/{3}}\,\left|0\right\rangle -\sqrt{1/3}\, \left|1\right\rangle & \mathrm{with\ probability\ }1/2. \end{array} $$
It turns out that these two probability distributions are indistinguishable. That is, any measurement you make on one will give exactly the same probability distribution of results that you make on the other. The reason for that is that $$ \frac{2}{3} |0\rangle\langle0| +\frac{1}{3}|1\rangle\langle 1| $$ and $$ \frac{1}{2}\left(\sqrt{2/3}\left|0\right\rangle +\sqrt{1/3}\, \left|1\right\rangle\right) \left(\sqrt{2/3}\left\langle 0\right| +\sqrt{1/3}\, \left\langle 1\right|\right) +\frac{1}{2}\left(\sqrt{{2}/{3}}\left|0\right\rangle -\sqrt{1/3}\, \left|1\right\rangle\right) \left(\sqrt{2/3}\left\langle 0\right| -\sqrt{1/3}\, \left\langle 1\right|\right) $$ are the same matrix.
That is, a probability distribution on quantum states is an overly specified distribution, and it is quite cumbersome to work with. We can predict any experimental outcome for a probability distribution on quantum states if we know the corresponding density operator, and many probability distributions yield the same density operator. If we have a probability density $\mu_v$ on quantum states $v$, we can predict any experimental outcome from the density operator $$ \int v v^* d \mu_v \,. $$
So for quantum probability theory, instead of working with probability distributions on quantum states, we work with density operators instead.
Classical states correspond to orthonormal vectors in Hilbert space, and classical probability distributions correspond to diagonal density operators.