Why are states rays?
(Answer to OP's 1. and 2.)
One of the fundamental tenets of quantum mechanics is that states of a physical system correspond (not necessarily uniquely - this is what projective spaces in QM are all about!) to vectors in a Hilbert space $\mathcal{H}$, and that the Born rule gives the probability for a system in state $\lvert \psi \rangle$ to be in state $\lvert \phi \rangle$ by
$$ P(\psi,\phi) = \frac{\lvert\langle \psi \vert \phi \rangle \rvert^2}{\lvert \langle \psi \vert \psi \rangle \langle \phi \vert \phi \rangle \rvert}$$
(Note that the habit of talking about normalised state vectors is because then the denominator of the Born rule is simply unity, and the formula is simpler to evaluate. This is all there is to normalisation.)
Now, for any $c \in \mathbb{C} - \{0\}$, $P(c\psi,\phi) = P(\psi,c\phi) = P(\psi,\phi)$, as may be easily checked. Therefore, especially $P(\psi,\psi) = P(\psi,c\psi) = 1$ holds, and hence $c\lvert \psi \rangle$ is the same states as $\lvert \psi \rangle$, since that is what having probability 1 to be in a state means.
A ray is now the set of all vectors describing the same state by this logic - it is just the one-dimensional subspace spanned by any of them: For $\lvert \psi \rangle$, the associated ray is the set
$$ R_\psi := \{\lvert \phi \rangle \in \mathcal{H} \vert \exists c \in\mathbb{C}: \lvert \phi \rangle = c\lvert \psi \rangle \}$$
Any member of this set will yield the same results when we use it in the Born rule, hence they are physically indistiguishable.
Why are phases still relevant?
(Answer to OP's 3.)
For a single state, a phase $\mathrm{e}^{\mathrm{i}\alpha},\alpha \in \mathbb{R}$ has therefore no effect on the system, it stays the same. Observe, though, that "phases" are essentially the dynamics of the system, since the Schrödinger equation tells you that every energy eigenstate $\lvert E_i \rangle$ evolves with the phase $\mathrm{e}^{\mathrm{i}E_i t}$.
Obviously, this means energy eigenstates don't change, which is why they are called stationary states. The picture changes when we have sums of such states, though: $\lvert E_1 \rangle + \lvert E_2 \rangle$ will, if $E_1 \neq E_2$, evolve differently from an overall multiplication with a complex phase (or even number), and hence leave its ray in the course of the dynamics! It is worthwhile to convince yourself that the evolution does not depend on the representant of the ray we chose: For any non-zero complex $c$, $c \cdot (\lvert E_1 \rangle + \lvert E_2 \rangle)$ will visit exactly the same rays at exactly the same times as any other multiple, again showing that rays are the proper notion of state.
The projective space is the space of rays
(Answer to OP's 4. and 5. as well as some further remarks)
After noting, again and again, that the physically relevant entities are the rays, and not the vectors themselves, one is naturally led to the idea of considering the space of rays. Fortunately, it is easy to construct: "Belonging to a ray" is an equivalence relation on the Hilbert space, and hence can be divided out in the sense that we simply say two vectors are the same object in the space of rays if they lie in the same ray - the rays are the equivalence classes. Formally, we set up the relation
$$ \psi \sim \phi \Leftrightarrow \psi \in R_\phi$$
and define the space of rays or projective Hilbert space to be
$$ \mathcal{P}(\mathcal{H}) := (\mathcal{H} - \{0\}) / \sim$$
This has nothing to do with the Gram-Schmidt way of finding a new basis for a vector space! This isn't even a vector space anymore! (Note that, in particular, it has no zero) The nice thing is, though, that we can now be sure that every element of this space represents a distinct state, since every element is actually a different ray.1
(Side note (see also orbifold's answer): A direct, and important, consequence is that we need to revisit our notion of what kinds of representations we seek for symmetry groups - initially, on the Hilbert space, we would have sought unitary representations, since we want to preserve the vector structure of the space as well as the inner product structure (since the Born rule relies on it). Now, we know it is enough to seek projective representations, which are, for many Lie groups, in bijection to the linear representations of their universal cover, which is how, quantumly, $\mathrm{SU}(2)$ as the "spin group" arises from the classical rotation group $\mathrm{SO}(3)$.)
OP's fifth question
When one limits the Hilbert space to that of a certain observable of the system at hand, e.g. momentum or spin space (in order to measure the momentum and spin of a system respectively), does that mean we're talking about projective spaces already? (e.g. is the spin space spanned by up |↑⟩ and down |↓⟩ spins states of a system referred to as projective spin Hilbert space?)
is not very well posed, but strikes at the heart of what the projectivization does for us: When we talk of "momentum space" $\mathcal{H}_p$ and "spin space" $\mathcal{H}_s$, it is implicitly understood that the "total space" is the tensor product $\mathcal{H}_p \otimes \mathcal{H}_s$. That the total/combined space is the tensor product and not the ordinary product follows from the fact that the categorial notion of a product (let's call it $\times_\text{cat}$) for projective spaces is
$$ \mathcal{P}(\mathcal{H}_1) \times_\text{cat} \mathcal{P}(\mathcal{H}_2) = \mathcal{P}(\mathcal{H}_1\otimes\mathcal{H}_2)$$
For motivations why this is a sensible notion of product to consider, see some other questions/answers (e.g. this answer of mine or this question and its answers).
Let us stress again that the projective space is not a vector space, and hence not "spanned" by anything, as the fifth question seems to think.
1The inquiring reader may protest, and rightly so: If our description of the system on the Hilbert space has an additional gauge symmetry, it will occur that there are distinct rays representing the same physical state, but this shall not concern us here.
Best Answer
Was typing this halfway through when Timaeus posted his answer. There is some overlap, but also some potentially useful additional details. Hope it helps.
So, before we get to the part on $\psi(x)$, we need to clarify the part about observables in general. The fundamental assumption of QM regarding observables is that every observable O is represented by a self-adjoint operator ${\hat O}$ on ${\mathcal H}$, that is ${\hat O}:{\mathcal H} \rightarrow {\mathcal H}$, ${\hat O} = {\hat O}^\dagger$. It is further postulated that the average value of O in any state $|\psi \rangle$ is given by the matrix element $\langle \psi | {\hat O} | \psi \rangle = \langle \psi | {\hat O} \psi \rangle$. The fact that ${\hat O}$ is postulated self-adjoint, has several dramatic consequences:
1) If ${\hat O}$ is self-adjoint, it is already implied that there exists in ${\mathcal H}$ a basis of eigenstates $|\omega \rangle$ of ${\hat O}$, for which ${\hat O}|\omega \rangle = \omega |\omega \rangle$.
2) If the $| \omega \rangle$-s form a basis, it follows that any state vector $|\psi \rangle$ can be expressed as a superposition $|\psi \rangle = \sum_{\omega} {c_\omega |\omega \rangle }$.
3) From the existence of decompositions $|\psi \rangle = \sum_{\omega} {c_\omega |\omega \rangle }$ it follows that for an arbitrary $|\psi \rangle$ the standard deviation associated to the average value of O on $|\psi \rangle$ is in general non-zero, $\langle \psi | (\Delta {\hat O})^2 | \psi \rangle = \langle \psi | {\hat O}^2 | \psi \rangle - \langle \psi | {\hat O} | \psi \rangle^2 ≠ 0$, but for the eigenstates of O we find $\langle \psi | (\Delta {\hat O})^2 | \psi \rangle = \omega^2 - \omega^2 = 0$. In other words, just from the self-adjointness of ${\hat O}$ we have that in any state $|\omega \rangle$ observable O has a well-defined, sharp value $\langle \omega | {\hat O} | \omega \rangle = \omega$.
4) Given the above, it follows further that two observables A and B cannot produce sharp values simultaneously unless their corresponding operators commute and admit a common set of eigenstates. It is not difficult to prove, but I will not go into details. Suffice it to emphasize that this is an expression of the uncertainty principle and everything follows from the mere self-adjointness of observables on the Hilbert space ${\mathcal H}$.
Now, how does this help us with the $\psi(x)$ problem and everything? It is actually the crux of it, since the Hilbert space of states ${\mathcal H}$ is always defined in terms of the system's degrees of freedom, which are nothing but a complete set of commuting observables. In general, if a system is completely characterized by degrees of freedom (or observables) $Q_1$, $Q_2$, ..., $Q_n$, then each of its states is necessarily labeled by a corresponding set of values $\{q_1, q_2, ..., q_n \}$ that can be measured simultaneously. According to the fundamental assumptions of QM, this means that $Q_1$, $Q_2$, ..., $Q_n$ must necessarily be represented on ${\mathcal H}$ by mutually commuting self-adjoint operators ${\hat Q_1}$, ${\hat Q_2}$, ..., ${\hat Q_n}$ that admit a common set of eigenstates labeled by $\{q_1, q_2, ..., q_n \}$, say $|q_1, q_2, ..., q_n \rangle$. All is good so far, but how do we get such $|q_1, q_2, ..., q_n \rangle$ in the first place? Simple: we postulate them. There is literally nothing else that we can do. But once we do this, we are left with an airtight definition of the Hilbert space of states ${\mathcal H}$ and we can even do physics.
To get to the actual heart of the matter: Let's take a spinless particle on a 1D line. It has one degree of freedom, which we can take to be the position $x$ along the line. Observable $x$ must be represented in the Hilbert space of the particle by a self-adjoint operator ${\hat x}$ that generates a basis set $|x\rangle$. The states $|x \rangle$ must be states in which the particle is found with certainty at position $x$, such that ${\hat x} | x \rangle = x | x \rangle$. Conversely, if we postulate the states $|x \rangle$ as a basis set, we have defined ${\mathcal H}$. The important thing is that now any state $|\psi \rangle$ of the particle can be represented as a superposition
$$ | \psi \rangle = \int{dx\; \psi(x) |x \rangle}, \;\; \text{with}\;\;\psi(x) = \langle x | \psi \rangle $$
Of course, we can also define the Hilbert space in terms of the particle's momentum $p$ and obtain a similar decomposition in terms of momentum eigenstates $|p \rangle$. Is this Hilbert space different from the one before? No. Since the Hilbert space comprises all possible states of the particle, the momentum eigenstates $|p \rangle$ must admit a decomposition in terms of position eigenstates $|x \rangle$ and vice-versa. Once the plane-wave form of the wavefunction $\langle x | p \rangle$ is defined, all rules about decomposing a state as a superposition of basis states apply as before.
Are there other representations than position or momentum? Absolutely! Think energy eigenstates in the hydrogen atom. The bound states are labeled by energy, angular momentum, azimuthal angular momentum, and spin. The corresponding wave functions are decompositions in the position representation. However, the spin component comes as an additional degree of freedom, not from the Schroedinger equation itself. Once the existence of spin is acknowledged, the Hilbert space is extended to account for spin states. And this brings us to the last question.
The essence of your question is: given a Hilbert space for a certain number of degrees of freedom, how is it extended if we need to account for additional degrees of freedom? Take for instance the 1D particle. What if the particle has spin? It obviously keeps the $x$ degree of freedom, but we need to account in addition for the spin degree of freedom. We can build new states $|x, \sigma \rangle$ that locate a particle with spin $\sigma$ (along some given direction) at position $x$, rebuild the Hilbert space, and think about $x$ and $\sigma$ as labels, the way Timaeus suggested. But the reality is that the Hilbert space so extended is isomorphic to the direct product of the spinless Hilbert space by a new Hilbert space corresponding to the spin degree of freedom. In fact, the direct product form is the preferred one for a variety of reasons. This is why we can and do commonly write something like $|\Psi \rangle = |\psi \rangle \otimes |\sigma \rangle$, where $|\psi \rangle$ is a spinless state vector.
So can we think the same way about spatial degrees of freedom like $x$ and $y$? Formally yes. Although you can view $x$ and $y$ as labels for a 2D particle, there is also an isomorphism with a system of two distinguishable 1D particles. Similarly, a system of two distinguishable 3D particles is isomorphic to one particle living in a 6D configuration space. But unlike for spin, it is commonly more convenient to use the label view, especially when dealing with coordinate changes to curvilinear coordinates. The isomorphism is still there, but is not invoked.
Additional info based on comments:
Are Hilbert spaces for different degrees of freedom (DOFs) isomorphic? If so, don't we need just one copy in the total Hilbert space? The Hilbert spaces corresponding to dissimilar DOFs, like position and spin, may have very different cardinality (dimension) and thus need not be isomorphic to each other. In general, even if some DOFs do generate formally isomorphic Hilbert spaces, like the $x$ and $y$ coordinates, the total Hilbert space of the system must account for each DOF individually. It is always a direct product of the Hilbert spaces for each of the DOFs. For a 1D particle with spin, it is ${\mathcal H}_x \otimes {\mathcal H}_\sigma$. For a 2D particle with spin, it is ${\mathcal H}_x \otimes {\mathcal H}_y \otimes {\mathcal H}_\sigma$ or ${\mathcal H}_{x,y} \otimes {\mathcal H}_\sigma$, but no longer just ${\mathcal H}_x \otimes {\mathcal H}_\sigma$.
How do we account for non-commuting observables? Take $x$ and $p$ for the 1D particle. There are no common $|xp \rangle$ eigenstates ($[x,p] ≠0$), but the spinless ${\mathcal H}$ is completely characterized by means of either the $\{ |x \rangle \}$ or the $\{ |p \rangle \}$ basis. Since ${\mathcal H}$ must contain all possible (spinless) states, if we choose the $\{ |x \rangle \}$ basis then we must express the $ |p \rangle $ states in terms of the $ |x \rangle $ states, and conversely. This is done by specifying the (canonical) commutation relation between $x$ and $p$, $[x, p] = i\hbar$, which leads to an expression for the momentum wave functions $\langle x | p \rangle$.
For a complete set of commuting observables, why do we keep only the common eigenstates? This is simply because the "common eigenstates" include all the eigenstates for the every observable in the complete set. In other words, given any observable ${\hat O}$ from the complete set, there are no eigenstates of ${\hat O}$ that are not in the set of "common eigenstates". For an observable ${\hat A}$ that is not among those in the complete set, the situation depends on whether it commutes or not with those in the complete set, but the result is the same. If it commutes with the entire set (think any $f({\hat O})$ ), its eigenstates are still those of the complete set. If it doesn't commute, (at least some of) its eigenstates are not in the set's common eigenstates, but can always be expressed as superpositions of those common eigenstates.