You have an expression for the total energy
$$U_E = \frac{1}{2}\int_V \rho(\vec{r})\phi(\vec{r})\mathrm{d}^3x.$$
And now you can break the charge density into two parts $\rho=\rho_1+\rho_2$ and the potential into two parts $\phi=\phi_1+\phi_2$, where each is due to just one sphere (so the additions hold by linearity of charge and superposition of fields/potential). Put them in and you get
$$U_E = \frac{1}{2}\int_V (\rho_1(\vec{r})+\rho_2(\vec{r}))(\phi_1(\vec{r})+\phi_2(\vec{r}))\mathrm{d}^3x.$$
Which equals
$$U_E = \frac{1}{2}\int_V \rho_1(\vec{r})\phi_1(\vec{r})+\rho_2(\vec{r})\phi_2(\vec{r})+ \rho_2(\vec{r})\phi_1(\vec{r})+ \rho_1(\vec{r})\phi_2(\vec{r})\mathrm{d}^3x.$$
The first two terms integrate out to be the energy of the spheres in isolation. So the last two terms are the interaction energy:
$$U_{int} = \frac{1}{2}\int_V \rho_2(\vec{r})\phi_1(\vec{r})+ \rho_1(\vec{r})\phi_2(\vec{r})\mathrm{d}^3x.$$
Now, each term integrates to tell you the total work done by one sphere on the other sphere as you (hypothetically) bring the isolated objects from super far away to their current positions. Since the forces were equal and opposite, the work done by one on the other is equal to the work done by the other on the one.
So the remaining two terms actually give equal integrals.
$$U_{int} = \frac{1}{2}\int_V \rho_2(\vec{r})\phi_1(\vec{r})+ \rho_1(\vec{r})\phi_2(\vec{r})\mathrm{d}^3x= \int_V \rho_1(\vec{r})\phi_2(\vec{r})\mathrm{d}^3x= \int_V \rho_2(\vec{r})\phi_1(\vec{r})\mathrm{d}^3x.$$
1)
I would not call this a capacitor. Your typical parallel plate capacitor has two charged plates kept at some potential difference (by being hooked up to opposite terminals of a battery, for example). These are just two charged plates that end up being connected and the charges balance out on each side. In otherwords, I would not say you are "storing" charge here like what you would expect a capacitor to do. You could still define a capacitance for the system, but it would not take the general form $C=\frac QV$, since we do not have a single $Q$ to reference.
In general, you can define a capacitance matrix $C_{ij}$ such that
$$Q_1=C_{11}V_1+C_{12}V_2$$
$$Q_2=C_{21}V_1+C_{22}V_2$$
Of course, this is more useful when the potentials of the plates are given. However, there is such thing as an "elastance matrix" $P_{ij}$, which is the inverse of the capacitance matrix:
$$V_1=P_{11}Q_1+P_{12}Q_2$$
$$V_2=P_{21}Q_1+P_{22}Q_2$$
These matrices are symmetric so that $C_{12}=C_{21}$ and $P_{12}=P_{21}$. These terms are related to the mutual capacitance between the plates. The diagonal terms deal with the self capacitance.
2) Due to symmetry and the fact that we are dealing with perfect conductors, the charge on each plate must be equal
$$Q_1'=Q_2'=\frac{Q_1+Q_2}{2}$$
3)
You can still figure out the potential energy difference between the two plates. If the plate separation is small, then between the plates we are looking at distances very close to plates, so we can treat them as infinite planes of charge. Using Gauss's law, we get that $E_1=\frac{\sigma _1}{2\epsilon _0}$ and $E_2=\frac{\sigma _2}{2\epsilon _0}$. Therefore, in between the plates, the field is
$$E=E_1-E_2=\frac{\sigma _1-\sigma _2}{2\epsilon _0}$$
Therefore, the potential difference between the plates is just
$$V=Ed=\frac{\sigma _1-\sigma _2}{2\epsilon _0}d$$.
However, you cannot express this in terms of $U=\frac 12 \epsilon _0 E^2$ for $E$ just inside the plates because the field is not $0$ outside of the plates.
4) There is no energy stored in the system, at least in the sense of energy typically stored in a typical capacitor. There is potential energy since the excess charges on each plate are interacting, but it would take no work to move one charge from one plate to the other since a perfect conductor is an equipotential surface. (Once you move that charge though, then moving another charge would require work, but this would involve some external force keeping the first charge in place, like from a battery, which then makes the system not an ideal conductor). Typically when you talk about energy being stored on a capacitor, you are talking about the energy needed to separate the charges and maintain that separation.
5) The electric field does work to move charges from one plate to the other. This is where the energy goes.
I am not an expert on capacitance, so anyone can correct my reasoning here if something is off. I think something we take for granted in relating the energy stored in the capacitor to the energy in the fields is that in the typical parallel plate capacitor the field is $0$ outside of the system so that the potential difference and the energy and the field are easily to relate. I think in your initial set up you have to be careful in thinking about if you just want to consider the field between the plates or the overall field.
Best Answer
In order to move charges of equal sign that were separated by an infinite distance to some finite distance, you need to do work.
If you arrange an electric field $\vec{E}(x,y,z)$ in space by bringing in a large number of small electric charges (positive and negative) from infinitely far away to some arrangement, this takes work. For example, for a plate capacitor, you need to do positive work to collect positive charges close together on one plate, and positive work to collect negative charges on the other plate, and then negative work to bring the plates close together. The combined positive work from combining the charges on the plates will be larger what you get back from bringing the plates near each other, unless the plates touch and the charges cancel each other.
It turns out that the net work $W$ done to arrange the charges is equal to the integrated $dU/dV$ over all space: $$ W = \frac12 \epsilon_0 \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} E^2(x, y, z)\, dx\,dy\,dz. $$ This equivalency of work (or energy) and the energy density of the electric field only exists in the above integral form. You cannot take an arbitrary cubic millimeter of space and somehow extract the field energy from it without affecting the field in the space around it as well.
As you have probably already done, you can verify that it works out for simple cases such as plate capacitors. Unfortunately, the general case requires mathematical knowledge (vector calculus) that you don't learn in high school. Once you have the vector-calculus toolbox and the description of electrostatics in vector calculus, the derivation is just a few lines. If you go and study physics in university, you'll learn this by the end of the first year.
I left out the effect of dielectrics ($\epsilon=\epsilon_0\epsilon_{\mathrm r}$). You can use the same reasoning if you start from an infinite dielectric medium, but it's a more tricky if $\epsilon_{\mathrm{r}}$ varies over space -- and unfortunately again too difficult for high-school level.