The definition of temperature through Maxwellian and Boltzmann distributions have certain problems in quantum mechanics.
In thermodynamics temperature is usually defined through the derivative of entropy as you say:
$$
\frac{1}{T} = \frac{\partial S(E,\mathbf{V})}{\partial E}. \qquad (1)
$$
The division of the system into different parts (or different degrees of freedom) can be understood form the microcanonical distribution. Let the system have Hamiltonian of the following form:
$$
H = H(\mathbf{q}, \mathbf{p}, \mathbf{V});
$$
where $\mathbf{q}$ and $\mathbf{p}$ are the vectors of microscopic generalized coordinates and momenta respectively and $\mathbf{V}$ is the vector of macroscopic parameters that are constant (at the average) in the equilibrium.
The dimension of $\mathbf{q}$ and $\mathbf{p}$ is the number of the degrees of freedom of the system. Note that degrees of freedom of the same type (e.g. translation along $x$ axis) of different particles are different degrees of freedom. The set of $(\mathbf{q},\mathbf{p})$ pairs is the phase space of the system.
The distribution function for the system is
$$
f(\mathbf{q},\mathbf{p}) =
\frac{
\delta\bigl( E - H(\mathbf{q}, \mathbf{p}, \mathbf{V}) \bigr)
}{\Omega(E, \mathbf{V})};
$$
where $E$ is the internal energy and $\Omega(E, \mathbf{V})$ is the phase density of states or the number of accessible microscopic states for given $E$ and $\mathbf{V}$:
$$
\Omega(E, \mathbf{V}) =
\int \delta\bigl( E - H(\mathbf{q}, \mathbf{p}, \mathbf{V}) \bigr) d\mathbf{q} d\mathbf{p}.
$$
The entropy is
$$
S(E, \mathbf{V}) = \ln \Omega(E, \mathbf{V})
$$
Temperature of a subsystem
Let the system consist of two independent (non-interacting) subsystems. Then
$$
\mathbf{q} = (\mathbf{q}_1, \mathbf{q}_2); \quad \mathbf{p} = (\mathbf{p}_1, \mathbf{q}_2);
$$
$$
H(\mathbf{q}, \mathbf{p}, \mathbf{V}) =
H_1(\mathbf{q}_1, \mathbf{p}_1, \mathbf{V}) +
H_2(\mathbf{q}_2, \mathbf{p}_2, \mathbf{V}). \qquad (2)
$$
NB:
The subsystems are not obliged to be separated spatially. They even are not obliged to consist of different particles. The only requirement is that the Hamiltonian must have the form (2). We can put all translational coordinates to $\mathbf{q}_1$, rotational to $\mathbf{q}_2$, oscillatory to $\mathbf{q}_3$ and so on. If the energy transfer (interaction) between the subsystems is negligible during some period of time then expression (2) is correct for that period.
We can introduce distribution functions for each subsystem:
$$
f_i(\mathbf{q}_i,\mathbf{p}_i) =
\frac{
\delta\bigl( E_i - H_i(\mathbf{q}_i, \mathbf{p}_i, \mathbf{V}) \bigr)
}{\Omega_i(E_i, \mathbf{V})};
$$
where $E_i$ is the internal energy of the subsystem.
The entropy of the subsystem then is
$$
S_i(E_i, \mathbf{V}) = \ln \Omega_i(E_i, \mathbf{V})
$$
and the temperature is
$$
T_i = \left( \frac{\partial S_i(E_i, \mathbf{V})}{\partial E_i} \right)^{-1} \qquad (3)
$$
Here is the definition of the temperature of the subsystem (degree of freedom).
Temperatures in the equilibrium
Since the subsystems are independent the distribution function of whole system is the product:
$$
f(\mathbf{q},\mathbf{p}) = f_1(\mathbf{q}_1,\mathbf{p}_1)f_2(\mathbf{q}_2,\mathbf{p}_2);
$$
and total number of accessible states is:
$$
\Omega(E_1, E_2, \mathbf{V}) = \Omega_1(E_1, \mathbf{V})\Omega_2(E_2, \mathbf{V}).
$$
Hence the total entropy is
$$
S(E_1, E_2, \mathbf{V}) = S_1(E_1, \mathbf{V}) + S_2(E_2, \mathbf{V}) \qquad (4)
$$
If there is an interaction between the subsystems the internal energy will be transfered from one system to the other until the equilibrium will be reached. During this process the total energy is constant:
$$
E = E_1 + E_2 = \text{const}
$$
The energies of the subsystems changes with time and have certain values in the equilibrium. According to the 2nd law of thermodynamics the total entropy is maximal in this state. The condition of the extremum is
$$
\frac{\partial S(E_1, E_2(E, E_1), \mathbf{V})}{\partial E_1} = 0.
$$
From (4) we get:
$$
\frac{\partial S(E_1, E_2(E, E_1), \mathbf{V})}{\partial E_1} =
\frac{\partial S_1(E_1, \mathbf{V})}{\partial E_1} +
\frac{\partial S_2(E_2, \mathbf{V})}{\partial E_2}\frac{\partial E_1}{\partial E_2} =
$$
$$
\frac{1}{T_1} - \frac{1}{T_2} = 0
$$
or
$$
T_1 = T_2.
$$
One can prove that these temperatures are equal to $T$ defined as (1).
The answer to your question is quite interesting: the ideal gas equation of state is very general; it applies whenever the particles don't interact with one another (except by very short range forces to allow collisions). The relation between momentum and energy (called the dispersion relation when you do the analysis using statistical mechanics) can be anything at all and you still get $p V = N k_B T$.
In the Boltzmann factor it is always the energy that appears, so when the relationship between momentum and energy is different (e.g. in a relativistic gas) the distribution over momentum is different. For example one gets
$$
f(p) \propto p^2 e^{-E/k_B T}
$$
where $p = \gamma m v$ and $E = \gamma m c^2$ and $\gamma = 1/\sqrt{1-v^2/c^2}$. This distribution is not the MB distribution but remarkably you still get $p V = N k_B T$. The easiest way to show this is using statistical mechanics via the single particle partition function.
This is an example of a macroscopic phenomenon (the way pressure in a gas relates to volume and temperature) being independent of many of the details of the micro-physics (it could be any dispersion relation, and you could treat the parts of the gas using either classical or quantum physics). This raises some very interesting points about the way micro-physics connects to macro-physics.
Best Answer
Let's start with the physical interpretation. We are considering an ideal gas of particles in equilibrium at some temperature $T$. Let's ask the following question: if the system is in equilibrium, why don't all particles have the same speed? Answer: because the particles interact through collisions. Imagine that one could prepare a system in such a way that each particle of an ideal gas enters a box with a set speed $v_0$. Will all the particles maintain the same speed $v_0$ once the collisions start to occur? No. If the particles never collided, they would maintain their initial speed - but the particles do collide. We know from simple mechanics that collisions lead to a whole range of speeds in different directions, depending on the angle of incidence etc.
Instead of having a set speed $v_0$ then we will have a whole set of speeds. Some of those speeds will be more probable than others. For example, it takes a very special collision for a particle to transfer all its kinetic energy to the other particle, thus resulting in the initial particle having speed zero after the collision. There is a largely wider family of collisions in which the particle's speed diminishes by some percent, but is nonzero. Overall, we can say that there are some fluctuations of the speed values around some most probable value (which may have something to do with $v_0$). Now here comes the important point: just how big these fluctuations are depends on the temperature. For small temperatures fluctuations are subtle, and for larger temperatures the fluctuations are large. You can think that for larger temperatures (and correspondingly larger average speeds) there is a larger number of speeds accessible with some significant probability. This is mirrored in the behavior of the Maxwell-Boltzmann distribution function (after all this is why we choose this distribution function in the first place: because it describes the real world!).
Now on to math. We consider the Maxwell-Boltzmann distribution function, \begin{eqnarray} f(v) = \sqrt{\left(\frac{m}{2 \pi k_BT}\right)^3} 4\pi v^2 e^{- \frac{mv^2}{2k_BT}}~, \end{eqnarray} where $m$ is the mass of the particle, $v$ is its speed, $T$ is the temperature of the system and $k_B$ is the Boltzmann constant. Let us see that the dimension of the distribution function is \begin{eqnarray} \big[f(v) \big] = \sqrt{\left(\frac{\textrm{kg}}{\frac{\textrm{J}}{\textrm{K}} \times \textrm{K}} \right)^3} \frac{\textrm{m}^2}{\textrm{s}^2} =\left[\frac{1}{v}\right]~. \end{eqnarray} This agrees with our interpretation of $f(v)$ as probability per unit interval of speed. After integrating over speed we will get a dimensionless quantity: probability. All units agree.
Let us now see how the distribution function behaves for different values of $T$. First, let us find: where is the peak of the distribution function? This peak will correspond to the value of $v$ that is most probable. We can easily calculate that with \begin{eqnarray} && \frac{d}{dv} f(v) = \sqrt{\left(\frac{m}{2 \pi k_BT}\right)^3} 4\pi \left(2v e^{- \frac{mv^2}{2k_BT}} + v^2 \left(-\frac{2mv}{2k_BT} \right)e^{- \frac{mv^2}{2k_BT}} \right)= 0 ~~~\Rightarrow \\ && \Rightarrow ~~~v_{\textrm{peak}} = \pm \sqrt{\frac{2k_B T}{m}} ~. \end{eqnarray} Clearly, the peak of the distribution function takes place when the argument of the exponent is equal $-1$. Note that for small $T$ the peak will take place at some small velocity, while for large $T$ the peak will take place at some larger velocity. The value of the distribution function at the peak is \begin{eqnarray} f(v_{\textrm{peak}}) = \frac{1}{e} \sqrt{\left(\frac{m}{2 \pi k_BT}\right)^3} 4\pi \frac{2k_B T}{m} = \frac{1}{e} \sqrt{\frac{8m}{ \pi k_BT}} ~. \end{eqnarray} We can see that for small $T$, $f(v_{\textrm{peak}})$ is going to be large because $T$ is in the denominator, and for the same reason $f(v_{\textrm{peak}})$ will be smaller for bigger $T$.
We can now make some comparisons. For example, we can calculate for what ranges of $v$ the value of the distribution function is more than $\frac{1}{e}f(v_{\textrm{peak}})$ (i.e., what values of $v$ land us at the "top of the hill" of the distribution function). This is not hard to calculate: we set a condition \begin{eqnarray} && f(v) = \frac{1}{e}f(v_{\textrm{peak}}) ~~~\Rightarrow ~~~ v^2 e^{- \frac{mv^2}{2k_BT}} = \frac{1}{e^2} v_{\textrm{peak}}^2 ~. \end{eqnarray} In order to solve this equation let us go to the first order approximation, \begin{eqnarray} e^{- \frac{mv^2}{2k_BT}} \approx 1 - \frac{mv^2}{2k_BT}~. \end{eqnarray} Then we want to solve \begin{eqnarray} v^2 \left(1 - \frac{mv^2}{2k_BT} \right) = \frac{1}{e^2} v_{\textrm{peak}}^2~, \end{eqnarray} which is just a quadratic equation in $v^2$ leading to \begin{eqnarray} v^2 = \frac{k_BT}{m} \pm \frac{\sqrt{e^2 - 4}}{e}\frac{k_B T}{m} \end{eqnarray} Since \begin{eqnarray} \sqrt{1 - \frac{4}{e^2}} \approx 0.677 \approx \frac{1}{2} \end{eqnarray} (we just want to assess the overall nature of the function, so such approximations are valid), we can write \begin{eqnarray} v^2 = \frac{k_BT}{m} \pm \frac{1}{2}\frac{k_B T}{m} = \frac{1}{2}\frac{2k_BT}{m} \pm \frac{1}{4}\frac{2k_B T}{m} = \frac{1}{2}v^2_{\textrm{peak}} \pm \frac{1}{4}v^2_{\textrm{peak}}~. \end{eqnarray} Therefore for the value of the distribution function to be smaller than the peak value by no more than $e$, the velocity has to lie within \begin{eqnarray} v^2 \in \left(\frac{1}{4}v^2_{\textrm{peak}}, \frac{3}{4}v^2_{\textrm{peak}} \right) = \left(\frac{k_B T}{2m}, \frac{3k_B T}{2m} \right)~. \end{eqnarray} Note that this range depends on the temperature $T$! For small temperatures the range is smaller, while for large temperatures the range is larger. This means that the slope for the larger temperatures must be less steep if the value of the distribution function must decrease by a constant factor of $e$ over a larger range of velocities.
The mathematical behavior of the distribution function thus agrees with our qualitative understanding!