Contravariant vectors are "standard" vectors. Covariant vectors are linear applications on contavariant vectors producing scalars.
Let us start form the former case. If you fix a couple of bases $\{e_i\}_{i=1,\ldots,n}$ and $\{e'_i\}_{i=1,\ldots,n}$ in the finite dimensional vector space $V$ with dimension $n$, such that $e_i = \sum_j {A^j}_i e'_j$ for set of coefficients ${A^j}_i$ forming a (necessarily) non-singular matrix $A$, you have for a given vector $v \in V$:
$$v = \sum_i v^i e_i = \sum_j v'^j e'_j$$
and thuso
$$\sum_i v^i \sum_j {A^j}_i e'_j = \sum_j v'^j e'_j$$
so that:
$$\sum_j \left( \sum_i {A^j}_i v^i\right) e'_j = \sum_j v'^j e'_j\:.$$
Uniqueness of components of $v$ respect to $\{e'_i\}_{i=1,\ldots,n}$ eventually entails:
$$v'^j = \sum_i {A^j}_i v^i\qquad \mbox{where}\quad e_i = \sum_j {A^j}_i e'_j\tag1$$
This is nothing but the standard rule for transforming components of a given contravariant vector when one changes the decomposition basis.
Let us pass to consider covariant vectors. As I said above, a covariant vector is nothing but a linear map $\omega : V \to R$ ($R$ can be replaced by $C$ if dealing with complex vector spaces or the corresponding ring when considering modules). One easily proves that the set of real valued linear applications as above form a vector space, $V^*$, the so-called dual space of $V$. If $\{e_i\}_{i=1,\ldots,n}$ is a basis of $V$, there is an associated basis $$\{e^{*i}\}_{i=1,\ldots,n}$$ of $V^*$, the dual basis, defined by the requirements (in addition to linearity):
$$e^{*k}(e_i) = \delta^k_i\tag2$$
Therefore, a covariant vector $\omega \in V^*$ can alway decomposed as follows:
$$\omega = \sum_k \omega_k e^{*k}$$
and, using linearity, (2), and
$$v = \sum_i v^i e_i$$ one sees that
$$\omega(v) = \sum_k \omega_k v^k\:.$$
The RHS doe not depend on the choice of the basis $\{e_i\}_{i=1,\ldots,n}$ and the corresponding $\{e^{*i}\}_{i=1,\ldots,n}$ even if components of covariant and contravariant vectors $\omega$ and $v$ depend on the considered bases. Obviously, changing the basis in $V$ and passing to $\{e'_i\}_{i=1,\ldots,n}$ related to $\{e_i\}_{i=1,\ldots,n}$ through (1), $\{e'_i\}_{i=1,\ldots,n}$ turns out to correspond to a dual basis $\{e'^{*i}\}_{i=1,\ldots,n}$. A straightforward computation based on (2) shows that
$$e^{*i} = \sum_j {B_j}^i e'^{*j}$$
where $$B= \left(A^T\right)^{-1}\:.\tag3$$
Consequently, for a covariant vector
$$\omega = \sum_i \omega_i e^{*i} = \sum_j \omega'_j e'^{*j}$$ where
$$\omega'_j = \sum_j{B_j}^i \omega_i\:.\tag4$$
This relation, together with (3) is nothing but the standard rule for transforming components of a given covariant vector when one changes the decomposition basis.
This structure rarely appears dealing with classical physics, where one usually deals with orthonormal basis. The reason is that when changing basis and passing to another orthonormal basis, the matrix $A$ associating the bases is in the orthogonal group, so that:
$$B= \left(A^T\right)^{-1} =A\:.\tag3$$
and one cannot distinguish, working in components, between covariant and contravariant vectors, since the former in (1) and (4) are, in fact, identical. For instance, for a fixed force $F$ applied to a point with velocity $v$, the linear map associating the force with its power as a function of $v$ defines a covariant vector that we could indicate by $"F\cdot"$
$$\pi^{(F)}: v \mapsto F\cdot v$$
where $\cdot$ denotes the standard scalar product in the Euclidean rest space of a reference frame.
If the (real finite dimensional!) vector space $V$ is equipped with a, generally, indefinite, scalar product, that is a non-degenerate symmetric bi-linear map $g : V \times V \to R$, a natural identification of $V$ and $V^*$ arises. It is nothing but the linear and bijective map associating contravariant vectors with covariant vectors:
$$V \ni v \mapsto g(v, \:\:)\in V^*$$
Where, obviously $g(v, \:\:) : V \ni u \mapsto g(v, u)\in R$ turns out to be linear and thus define an element of $V^*$ as said.
In components, if $u= \sum_i u^i e_i$ and $s= \sum_i s^i e_i$, one has in view of the bilinearity property fulfilled by $g$:
$$g(u,s) = \sum_{i,j} g_{ij} u^is^j\qquad \mbox{where}\quad g_{ij} := g(e_i,e_j)\:.$$
The matrix of elements $g_{ij}$ is symmetric and non-singular (as $g$ is symmetric and non-degenerate). With this definition, one easily sees that, if $u\in V$ is a contravariant vector, the associated covariant one $g(u,\:\:)\in V^*$ has components:
$$g(u, \:\:\:)_k= \sum_ig_{ki}u^i$$
so that, the scalar product $g(u,v)$ of $u$ and $v$ can also be written:
$$g(u,v)= \sum_{ij} g_{ij}u^iv^j = \sum_i v_i u^i\:.$$
Finally, changing basis one has that:
$$g(u,s) = \sum_{i,j} g'_{lm} u'^ls'^m\qquad \mbox{where}\quad g'_{lm} := g(e'_l,e'_m)\:,$$
and
$$g'_{lm} = \sum_{ij}{B_l}^i {B_m}^j g_{il}\:.$$
Best Answer
This is all just a result of sloppy language on the part of people describing quantum mechanics. The state $$ \left\lvert \Psi \right\rangle = \frac{1}{\sqrt{2}} \left( \left\lvert \uparrow \right\rangle + \left\lvert \downarrow \right\rangle\right) \tag{1}$$ is a superposition of the two orthogonal states $\left\lvert \uparrow \right\rangle$ and $\left\lvert \downarrow \right\rangle$. The state is unlike either basis vector alone. A velocity vector $$\left\lvert v \right\rangle = a\left\lvert x \right\rangle + b\left\lvert y \right\rangle \tag{2}$$ for some values $a$ and $b$ is also a superposition of two orthogonal velocity vectors. It is unlike either basis vector alone.
Talking about $\left\lvert \Psi \right\rangle$ as "simultaneously in both states" is just plain sloppy. It's a superposition. It's not like either basis vector alone. It is, as you say, something completely distinct.
The reason for this disagreement in language comes from the fact that, in the end, quantum state vectors tell you probabilities of experimental outcomes. It really bugs people to think of the state of a physical system being fundamentally probabilistic. When it comes to measurement, the state $\left\lvert \Psi \right\rangle$ means that the system has a 1/2 probability to be measured spin up and and 1/2 probability to be measured spin down. People don't naturally think about the world around them in terms superposition states whose coefficients correspond to probability amplitudes. They'd rather think about the classical states independently and try to form some kind of notion of the system existing in combinations of classical states. Therefore, they naturally (but erroneously) say that the system is in both classical states at the same time, when really, as you said, the system is in a state that's completely different from either classical basis state.