To understand what these orbitals are, you first have to understand the notion of superposition in quantum mechanics. In regular classical physics, a particle or a system must be in a definite state. A car is at a particular mile marker on a highway, moving at a particular speed. The Moon orbits around the Earth with a particular velocity at a particular radius. Cats are either alive or dead.
In quantum mechanics, on the other hand, we find that particles and systems no longer necessarily have these definite properties; rather, they can exist in several different states at once. The famous example of this is, of course, Schrödinger's cat, which (after one half-life of its radioactive roommate) is neither completely alive, nor completely dead, but rather some weird combination of the two. While we have trouble envisioning this directly (or, at least, I do), it's pretty easy to mathematically describe this weird state of the cat. We use an abstract vector space, define one "direction" in this vector space to correspond to "alive", and the direction at right angles to "alive" to correspond to "dead". Call these vectors $\vec{a}$ and $\vec{d}$, respectively. The state of the cat after one half-life is then mathematically expressible as
$$
\frac{1}{\sqrt{2}} (\vec{a} + \vec{d}).
$$
The factor of $1/\sqrt{2}$ is because the states corresponding to vectors have to be unit vectors (or, more accurately, they can be taken to be unit vectors.) It's not a vector in either "direction", which means the cat is neither fully in the "alive" state nor in the "dead" state; rather, it's in a weird combination of the two.
So what does this have to do with orbitals? Well, when we solve the Schrödinger equation for the hydrogen atom, we find that the allowed wavefunctions of the electron are parameterized by three quantum numbers: $n$, $l$ (which is between 0 and $n$), and $m$ (which is between $-l$ and $+l$.) We can write these wavefunctions as something like
$$
\psi_{n,l,m} (\vec{r}).
$$
What's more, it happens that for a given $n$ and $l$, the wavefunctions with opposite $m$ values are complex conjugates of each other:
$$
\psi_{n,l,-m} (\vec{r}) = \psi^*_{n,l,m} (\vec{r})
$$
That's all well and good, but what if we want a real-valued wave function? For example, let's take the set of wavefunctions with $n = 2$ and $l= 1$. By the above logic, $\psi_{2,1,0}$ is its own complex conjugate; so it's already real-valued. Let's call this wavefunction $p_z(\vec{r})$. The other two wavefunctions $\psi_{2,1,1}$ and $\psi_{2,1,-1}$ are complex-valued, unfortunately. However, we can write the following two combinations of these wave functions:
$$
p_x(\vec{r}) = \frac{1}{\sqrt{2}}(\psi_{2,1,1} + \psi_{2,1,-1}) \qquad p_y(\vec{r}) = \frac{1}{\sqrt{2}i}(\psi_{2,1,1} - \psi_{2,1,-1})
$$
Both of these quantities are real (you should check this to satisfy yourself that this is true). So if the electron is in either of these superpositions, we can take its wavefunction to be entirely real-valued. In both cases, though, the electron no longer has a definite $m$ value; rather, it is partially in the $m = +1$ state and partially in the $m = -1$ state because it's in a superposition of these states of definite $m$ (just as Schrödinger's cat is not fully in the "alive" state or the "dead" state.)
I am of course glossing over a huge amount of subtlety and ambiguity here, but hopefully this explains what's going on with these real orbitals and why they can be written as sums of the complex orbitals.
You're right on a lot of counts. The wavefunction of the system is indeed a function of the form
$$
\Psi=\Psi(\mathbf r_1,\mathbf r_2),
$$
and there's no separating the two, because of the cross term in the Schrödinger equation. This means that it is fundamentally impossible to ask for things like "the probability amplitude for electron 1", because that depends on the position of electron 2. So at least a priori you're in a huge pickle.
The way we solve this is, to a large extent, to try to pretend that this isn't an issue - and somewhat surprisingly, it tends to work! For example, it would be really nice if the electronic dynamics were just completely decoupled from each other:
$$
\Psi(\mathbf r_1,\mathbf r_2)=\psi_1(\mathbf r_1)\psi_2(\mathbf r_2),
$$
so you could have legitimate (independent) probability amplitudes for the position of each of the electrons, and so on. In practice this is not quite possible because the electron indistinguishability requires you to use an antisymmetric wavefunction:
$$
\Psi(\mathbf r_1,\mathbf r_2)=\frac{\psi_1(\mathbf r_1)\psi_2(\mathbf r_2)-\psi_2(\mathbf r_1)\psi_1(\mathbf r_2)}{\sqrt{2}}.
\tag1
$$
Suppose that the eigenfunction was actually of this form. What could you do to obtain this eigenstate? As a fist go, you can solve the independent hydrogenic problems and pretend that you're done, but you're missing the electron-electron repulsion. You could solve the hydrogenic problem for electron 1 and then put in its charge density for electron 2 and solve its single electron Schrödinger equation, but then you'd need to go back to electron 1 with your $\psi_2$. You can then try and repeat this procedure for a long time and see if you get something sensible.
Alternatively, you could try reasonable guesses for $\psi_1$ and $\psi_2$ with some variable parameters, and then try and find the minimum of $⟨\Psi|H|\Psi⟩$ over those parameters, in the hope that this minimum will get you relatively close to the ground state.
These, and similar, are the core of the Hartree-Fock methods. They make the fundamental assumption that the electronic wavefunction is as separable as it can be - a single Slater determinant, as in equation $(1)$ - and try to make that work as well as possible. Somewhat surprisingly, perhaps, this can be really quite close for many intents and purposes. (In other situations, of course, it can fail catastrophically!)
In reality, of course, there's a lot more to take into account. For one, Hartree-Fock approximations generally don't account for 'electron correlation' which is a fuzzy term but essentially refers to terms of the form $⟨\psi_1\otimes\psi_2| r_{12}^{-1} |\psi_2\otimes\psi_1⟩$. More importantly, there is no guarantee that the system will be in a single configuration (i.e. a single Slater determinant), and in general your eigenstate could be a nontrivial superposition of many different configurations. This is a particular worry in molecules, but it's also required for a quantitatively correct description of atoms.
If you want to go down that route, it's called quantum chemistry, and it is a huge field. In general, the name of the game is to find a basis of one-electron orbitals which will be nice to work with, and then get to work intensively by numerically diagonalizing the many-electron hamiltonian in that basis, with a multitude of methods to deal with multi-configuration effects. As the size of the basis increases (and potentially as you increase the 'amount of correlation' you include), the eigenstates / eigenenergies should converge to the true values.
Having said that, configurations like $(1)$ are still very useful ingredients of quantitative descriptions, and in general each eigenstate will be dominated by a single configuration. This is the sort of thing we mean when we say things like
the lithium ground state has two electrons in the 1s shell and one in the 2s shell
which more practically says that there exist wavefunctions $\psi_{1s}$ and $\psi_{2s}$ such that (once you account for spin) the corresponding Slater determinant is a good approximation to the true eigenstate. This is what makes the shells and the hydrogenic-style orbitals useful in a many-electron setting.
However, a word to the wise: orbitals are completely fictional concepts. That is, they are unphysical and they are completely inaccessible to any possible measurement. (Instead, it is only the full $N$-electron wavefunction that is available to experiment.)
To see this, consider the state $(1)$ and transform it by substituting the wavefunctions $\psi_j$ by $\psi_1\pm\psi_2$:
\begin{align}
\Psi'(\mathbf r_1,\mathbf r_2)
&=\frac{\psi_1'(\mathbf r_1)\psi_2'(\mathbf r_2)-\psi_2'(\mathbf r_1)\psi_1'(\mathbf r_2)}{\sqrt{2}}
\\&=\frac{
(\psi_1(\mathbf r_1)-\psi_2(\mathbf r_1))(\psi_1(\mathbf r_2)+\psi_2(\mathbf r_2))
-(\psi_1(\mathbf r_1)+\psi_2(\mathbf r_1))(\psi_1(\mathbf r_2)-\psi_2(\mathbf r_2))
}{2\sqrt{2}}
\\&=\frac{\psi_1(\mathbf r_1)\psi_2(\mathbf r_2)-\psi_2(\mathbf r_1)\psi_1(\mathbf r_2)}{\sqrt{2}}
\\&=\Psi(\mathbf r_1,\mathbf r_2).
\end{align}
That is, the Slater determinant that comes from linear combinations of the $\psi_j$ is indistinguishable from the one you get from the $\psi_j$ themselves. This extends to any basis change on that subspace with unit determinant; for more details see this thread. The implication is that labels like s, p, d, f, and so on are useful to describe the basis functions that we use to build the dominating configuration in a state, but they cannot be reliably inferred from the many-electron wavefunction itself. (This is as opposed to term symbols, which describe the global angular momentum characteristics of the eigenstate, and which can indeed be obtained from the many-electron eigenfunction.)
Best Answer
What you have discovered is that the atomic orbitals $1s$, $2s$, $3s$, etc all overlap with each other. That is, at any particular distance from the nucleus there will (in general) be a non-zero probability of finding electrons from all the occupied orbitals in the atom.
The electron orbitals are not precisely defined shells that nest inside each other like a set of Russian dolls. They are more like fuzzy blobs that all overlap with each other. The electrons don't (to borrow your phrase) hover around a sphere of radius of whatever. The electrons are spread out over the whole orbital.
Finally there is a subtlety I should mention. If you plot the $1s$ electron density it looks like this:
Note that the maximum is at the nucleus i.e. $r=0$. the plots you are describing show the probability of find the electron at some distance $r$ from the nucleus, and this is the density multiplied by the volume of a shell of radius $r$ i.e. it is:
$$ P(r)dr = \psi^2(r) 4 \pi r^2 dr $$
It's that extra factor of $r^2$ that produces the maximum to produce a graph that looks like:
So there is no shell of $1s$ electrons at the peak shown in the graph.