Before answering, I would like to say that the difference between macroscopic and microscopic is not made in terms of ensembles of systems; in fact, quantum mechanics has an ensemble interpretation. About your questions, my answers are the following:
Yes. General relativity is a pre-quantum theory, which means that does not account for the discrete particle-like structure of matter. Particularly, I never use the term "phenomenological theory", which I consider a misnomer.
Yes, Einstein, Grossmann, and Hilbert explicitly ignored the structure of matter when developed general relativity.
There is not microscopic picture of general relativity, because this is a (geo)metric theory. Somehow as there is not a microscopic picture of geometric optics. Of course there is a microscopic picture of physical optics which we call quantum optics. A quantum gravity is currently under active research. A first step is the quantum field theory of gravitons whose "microscopic picture" is close to that of quantum electrodynamics.
There are many cases where the continuous fluid approximation used in general relativity breaks down. E.g. if there are shock waves in your interacting fluids, then they cannot be described by a continuous fluid model. The best that you can do is to describe matter at the mesoscopic level and gravity at the macroscopic level. An example is the Einstein/Vlasov approach. Matter (e.g. a collision-less plasma) is described by the Vlasov kinetic equation, but $g_{\mu\nu}$ is obtained from an approximated energy-momentum tensor $T_{\mu\nu}$ which is computed from averaging over matter with the help of the kinetic $f(x,p,t)$ (see eq. 32 in above link). Both mesoscopic and microscopic descriptions of gravity are entirely outside the scope of GR.
No. Because the (geo)metric model of general relativity is not fundamental, as Feynman already noted [1]:
It is one of the peculiar aspects of the theory of gravitation, that is has both a field interpretation and a geometrical interpretation. [...] The geometrical interpretation is not really necessary or essential to physics.
The underlying quantum theory of gravity uses, essentially, the same space and time as quantum mechanics.
No. There are lots of flawed thermodynamic analogies found in the general relativity literature (black hole thermodynamics being the more popular of them).
[1] Feynman Lectures on Gravitation 1995: Addison-Wesley Publishing Company; Massachusetts; John Preskill; Kip S. Thorne (foreword); Brian Hatfield (Editor). Feynman. Richard P.; Morinigo, B. Fernando; Wagner, William G.
What is exactly the canonical ensemble?
Thermodynamic ensembles are ensembles in the mathematical sense, so your option no. 2 is the correct one. Consider a system of non-identical particles, this will appear much more clearly.
What do "thermal average" and "thermal fluctuation" mean?
"Average" is not something per se, one should speak about the thermal average of a quantity $A$. This is the average value taken by $A$ over all configurations of the ensemble, the average being weighted by the probability of each configuration (Boltzmann factor in the canonical ensemble).
The same remark holds for "fluctuation". The thermal fluctuation of a quantity $A$ is the weighted variance (or std. dev.) over the ensemble.
What about time evolution?
The time evolution of the system will reflect the ensemble statistics if the system is ergodic. Some systems are not; glasses are one notable example of non-ergodicity.
(Note: the std. dev. is $\sqrt{N\text{var}(E)}$.)
Is thermodynamics a limit case of statistical mechanics?
Yes, and the limit $N→+∞$ is appropriately called "thermodynamic limit". In practice any macroscopic system has negligible fluctuations, for $N\sim\mathcal N_A≈ 6·10^{23}$.
Best Answer
The question asks for intuition. The equation shown in the question can be derived from the fact that $\sigma_E^2$ and $C$ can both be expressed in terms of derivatives of the partition function with respect to $\beta=1/kT$, as shown here. We can extract some intuition from that derivation.
In the canonical ensemble, the probability of a state with energy $E$ is proportional to $e^{-E/kT}$. The partition function can be written either as a sum over states, $$ Z = \sum_n e^{-E_n/kT}, \tag{1} $$ or as a sum over energies, $$ Z = \sum_E \rho(E) e^{-E/kT} \tag{2} $$ where $\rho(E)$ is the density of states with energy $E$. In a comment, you described some prior intuition about heat capacity:
To get that intuition from (2), use the fact that for each of those types of degree of freedom (translational, rotational, vibrational), the number of such states in an energy-shell of given width $dE$ is an increasing function of the energy $E$. Consequently, the more different types of degrees of freedom the system has, the more rapidly $\rho(E)$ grows as a function of $E$. That's the key. A system with more degrees of freedom has larger-magnitude derivatives of the partition function with respect to $\beta=1/kT$, for any given value of $T$. Since $\sigma_E^2$ and $C$ are both expressed in terms of the first- and second-derivatives of the partition function with respect to $\beta$, this explains intuitively why both of them have the same dependence on the number of degrees of freedom. In particular, it explains intuitively why both of them are larger (for a given $T$) in systems with more degrees of freedom.
Here's a little more detail about the intuition behind $\sigma_E^2$. Consider the summand in (2). We can think of $\sigma_E$ as the width of the peak in the graph of $\rho(E)e^{-E/kT}$ as a function of $E$. If we make $\rho(E)$ grow more rapidly, we shift the peak to a larger value of $E$, and that also makes the peak wider because $e^{-E/kT}$ doesn't decrease as rapidly for larger values of $E$ (it's derivative approaches zero as $E\to \infty$). That's why increasing the number of degrees of freedom makes $\sigma_E^2$ larger, for given value of $T$.
The intuition is tied to the mathematical form of the canonical ensemble (1)-(2). That's unavoidable, because we can contrive different ensembles in which $\sigma_E^2$ and $C$ are not be related to each other as shown in the question. Then again, since the canonical ensemble is essentially a consequence of the microcanonical ensemble (which is the least-presumptuous ensemble), we don't normally need to consider other ensembles, so the relationship shown in the question is relatively robust.