The definition of temperature through Maxwellian and Boltzmann distributions have certain problems in quantum mechanics.
In thermodynamics temperature is usually defined through the derivative of entropy as you say:
$$
\frac{1}{T} = \frac{\partial S(E,\mathbf{V})}{\partial E}. \qquad (1)
$$
The division of the system into different parts (or different degrees of freedom) can be understood form the microcanonical distribution. Let the system have Hamiltonian of the following form:
$$
H = H(\mathbf{q}, \mathbf{p}, \mathbf{V});
$$
where $\mathbf{q}$ and $\mathbf{p}$ are the vectors of microscopic generalized coordinates and momenta respectively and $\mathbf{V}$ is the vector of macroscopic parameters that are constant (at the average) in the equilibrium.
The dimension of $\mathbf{q}$ and $\mathbf{p}$ is the number of the degrees of freedom of the system. Note that degrees of freedom of the same type (e.g. translation along $x$ axis) of different particles are different degrees of freedom. The set of $(\mathbf{q},\mathbf{p})$ pairs is the phase space of the system.
The distribution function for the system is
$$
f(\mathbf{q},\mathbf{p}) =
\frac{
\delta\bigl( E - H(\mathbf{q}, \mathbf{p}, \mathbf{V}) \bigr)
}{\Omega(E, \mathbf{V})};
$$
where $E$ is the internal energy and $\Omega(E, \mathbf{V})$ is the phase density of states or the number of accessible microscopic states for given $E$ and $\mathbf{V}$:
$$
\Omega(E, \mathbf{V}) =
\int \delta\bigl( E - H(\mathbf{q}, \mathbf{p}, \mathbf{V}) \bigr) d\mathbf{q} d\mathbf{p}.
$$
The entropy is
$$
S(E, \mathbf{V}) = \ln \Omega(E, \mathbf{V})
$$
Temperature of a subsystem
Let the system consist of two independent (non-interacting) subsystems. Then
$$
\mathbf{q} = (\mathbf{q}_1, \mathbf{q}_2); \quad \mathbf{p} = (\mathbf{p}_1, \mathbf{q}_2);
$$
$$
H(\mathbf{q}, \mathbf{p}, \mathbf{V}) =
H_1(\mathbf{q}_1, \mathbf{p}_1, \mathbf{V}) +
H_2(\mathbf{q}_2, \mathbf{p}_2, \mathbf{V}). \qquad (2)
$$
NB:
The subsystems are not obliged to be separated spatially. They even are not obliged to consist of different particles. The only requirement is that the Hamiltonian must have the form (2). We can put all translational coordinates to $\mathbf{q}_1$, rotational to $\mathbf{q}_2$, oscillatory to $\mathbf{q}_3$ and so on. If the energy transfer (interaction) between the subsystems is negligible during some period of time then expression (2) is correct for that period.
We can introduce distribution functions for each subsystem:
$$
f_i(\mathbf{q}_i,\mathbf{p}_i) =
\frac{
\delta\bigl( E_i - H_i(\mathbf{q}_i, \mathbf{p}_i, \mathbf{V}) \bigr)
}{\Omega_i(E_i, \mathbf{V})};
$$
where $E_i$ is the internal energy of the subsystem.
The entropy of the subsystem then is
$$
S_i(E_i, \mathbf{V}) = \ln \Omega_i(E_i, \mathbf{V})
$$
and the temperature is
$$
T_i = \left( \frac{\partial S_i(E_i, \mathbf{V})}{\partial E_i} \right)^{-1} \qquad (3)
$$
Here is the definition of the temperature of the subsystem (degree of freedom).
Temperatures in the equilibrium
Since the subsystems are independent the distribution function of whole system is the product:
$$
f(\mathbf{q},\mathbf{p}) = f_1(\mathbf{q}_1,\mathbf{p}_1)f_2(\mathbf{q}_2,\mathbf{p}_2);
$$
and total number of accessible states is:
$$
\Omega(E_1, E_2, \mathbf{V}) = \Omega_1(E_1, \mathbf{V})\Omega_2(E_2, \mathbf{V}).
$$
Hence the total entropy is
$$
S(E_1, E_2, \mathbf{V}) = S_1(E_1, \mathbf{V}) + S_2(E_2, \mathbf{V}) \qquad (4)
$$
If there is an interaction between the subsystems the internal energy will be transfered from one system to the other until the equilibrium will be reached. During this process the total energy is constant:
$$
E = E_1 + E_2 = \text{const}
$$
The energies of the subsystems changes with time and have certain values in the equilibrium. According to the 2nd law of thermodynamics the total entropy is maximal in this state. The condition of the extremum is
$$
\frac{\partial S(E_1, E_2(E, E_1), \mathbf{V})}{\partial E_1} = 0.
$$
From (4) we get:
$$
\frac{\partial S(E_1, E_2(E, E_1), \mathbf{V})}{\partial E_1} =
\frac{\partial S_1(E_1, \mathbf{V})}{\partial E_1} +
\frac{\partial S_2(E_2, \mathbf{V})}{\partial E_2}\frac{\partial E_1}{\partial E_2} =
$$
$$
\frac{1}{T_1} - \frac{1}{T_2} = 0
$$
or
$$
T_1 = T_2.
$$
One can prove that these temperatures are equal to $T$ defined as (1).
For a degree of freedom whose energy is quadratic in just momentum (but flat in position, or flat with hard walls), the average energy classically is $kT/2$. That is the basic equipartition theorem for an ideal gas. However a lesser known result is that a classical degree of freedom with energy quadratic in both momentum and position has an average energy of $kT$. The atoms in a solid are in some sense each in a 3-way harmonic oscillator (this is the Einstein model) and hence one has $3NkT$ energy, i.e. $3Nk$ heat capacity.(†)
To understand this intuitively you should of course derive the equipartition theorem for yourself. But, basically, by having energy also quadratic in position you make the lower energy states less common; not only does the low energy require a small momentum, but also a particular position. By increasing energy, more and more positions become available. In contrast with a flat potential the position can always take on any value and so a low energy state only needs the momentum to be small.
So if you were to imagine each atom in a solid as instead as being inside its own little box with hard walls, then such a model would only give $3Nk/2$ heat capacity.
(†) Okay, actually the atoms are all coupled together however when you look at it this way, you can't so simply talk about the separate contributions of individual atoms anymore. Looking at these whole vibrations gives you phonons and the Debye model. Basically though, all of the atomic harmonic oscillators mix together into various modes, but of course the number of modes remains the same as the original number of individual oscillators. But, each mode is itself a harmonic oscillator so you get the $3Nk$ heat capacity at high temperature.(‡)
(‡) Actually, only $3(N-\frac{1}{2})k$ since three of the collective modes do not oscillate but rather correspond to the linear motion of the whole block of material. So, those three modes each give only $kT/2$.
Best Answer
You don't say whether your analysis includes rotational modes. I assume it does otherwise the disagreement between experiment and the ideal gas specific heat would be profound. A linear molecule will have two rotational modes, each adding $\tfrac{1}{2}R$, so the specific heat (excluding vibration) will be $C_p = \tfrac{7}{2}R$.
Anyhow, hydrogen has quite a high rotational constant of $B = 60.85\,\text{cm}^{-1}$. The spacing between rotational energy levels is $2B$, $4B$, $6B$ etc. If we take $4B$ and convert this to an energy we get about $4.8 \times 10^{-21}$ J. If we take this equal to $kT$ and divide by $k$ we get a temperature of about $350$K, so at room temperature the rotational degrees of freedom aren't fully populated. That's why $C_p$ for hydrogen is less than $\tfrac{7}{2}R$, though actually it's only slightly less.