The group velocity $v_g$ of a wave packet (that's the speed of the maximum of the wave packet) is given by $v_g=\frac{\partial\omega}{\partial k}$. In this case, $\frac{\partial\omega}{\partial k}=\frac 1 \hbar\frac{\partial E}{\partial k}$, which easily evaluates to $v_g=\frac{3ta}{2}=:v_f$ for $k=0$. That's actually the definition of $v_f$: it is the group velocity at $k=K$ ($K$ is the point in the Graphene bandstructure where the Dirac cone occurs - note that it is a vector because $k$ has an $x$ and a $y$ component), because $E(K)=E_f$.

The effective mass from solid state physics is indeed infinite. If one talks about "zero effective mass Dirac fermions" in Graphene, this comes from the massless Dirac equation which has the same dispersion relation. The solid state physics effective mass doesn't work here, because the dispersion relation needs to be parabolic (not linear with a cusp), there are two papers about that on arXiv here and here.

### Fermi energy

If you operate at zero temperature, $T = 0$ K and fill the energy-states of a system according to the Pauli-exclusion-principle, the Fermi energy is the boundary at which all lower states are full and all higher states are empty. At $T = 0$ this boundary is a sharp line.

For example, say you have ladder with five steps which you have to “fill” with ten electrons. Due to the Pauli-exclusion principle, each step can only take two electrons. Now you fill up the ladder: 2 electrons in the first step, the next two in the second step and so forth until you put the last two electrons in the 5th step. The energy at this step (the 5th step) is your Fermi energy.

Metals have Fermi energies of several electron-volts (eVs). (Cu: 7 eV, Al: 11 eV)
For comparison, the thermal energy at room temperature is about $k_B T \sim $ 0.025 eV.

### Connection to the conductivity

First you must know that only electrons with an energy close to the Fermi energy can participate to the conduction process.
Why?
I mentioned earlier that at $T = 0$, the Fermi energy is a sharp line. At $T > 0$, this sharp line gets “washed out” and you get something like this:

$\hspace{1.8in}$

This means that instead of only full and empty states, you now have half empty states above and below the Fermi energy. This is turn means that you are now able to excite electrons in higher energy states, which you need to do if you want to accelerate them in a direction; i.e., if you want to do work on them.

But in consequence of the small contributions of the thermal and electric energies (thermal $ \sim $ 0.025 eV, electric less than that), you are only able to excite electrons VERY CLOSE to the Fermi energy ($E_F$). So only electrons close to $E_F$ will contribute to the conduction.

With this $E_F$ you can associate a velocity, the Fermi velocity:
$$
v_F = \sqrt{2 E_F / m}
$$

Now let's turn to the conductivity:

The conductivity $\sigma$ is defined as
$$
\sigma = \frac{n e^2 \tau}{m}
$$
where $n$ is the number of electrons, $e$ is the electron charge, $\tau$ is the time between two collisions and $m$ is the mass of an electron.

One can obtain $\tau$ from $\mathscr{l}$, the mean free path between two collisions, given as $\mathscr{l} = v_F \tau$ or inversely, $\tau = \mathscr{l}/v_F$. During this time, the electron is accelerated. A LARGE $v_F$ will therefore result in a SHORT acceleration time and LESS gain of speed for the electron in a given direction.

Finally you can write the conductivity as:
$$
\sigma = \frac{n e^2 \mathscr{l}}{m v_F}
$$

Remember that $v_F$, the Fermi velocity, is directly related to the Fermi Energy $E_F$.

As an instance, consider this table of Fermi energies:
Copper has a LOWER Fermi energy (7 eV) than aluminum (11 eV), so it has a LOWER Fermi velocity; hence, the time between two collisions ($\tau$) is LONGER than that in aluminum. This in turn means that the electron has MORE time to accelerate in a given direction which finally explains why copper is a better conductor.

## Best Answer

The Fermi velocity is related to the Fermi energy $\epsilon_F=\frac12m_{\text e}v_F^2$. Actually, the Fermi energy $\epsilon_F$ is the chemical potential of the electron gas (that is the minimum energy required to add an extra electron to the gas).

The energies of the electrons are distributed, according to the Fermi-Dirac distribution, as $$ f(\epsilon)=\frac{1}{\mathrm e^{\beta(\epsilon-\epsilon_F)}+1}.$$ At low temperature (large $\beta=1/k_BT$), this distribution is approximately a step $$f(\epsilon)\approx\left\{\begin{array}{ll}1&\text{if $\epsilon<\epsilon_F$}\\ 0&\text{otherwise}\end{array}\right.\qquad(T\ll\epsilon_F/k_B).$$ So at low temperature, the energy distribution is fully controlled by the Fermi energy, and so does the velocity. The mean square velocity depends on the density of states $D(\epsilon)$. If $D(\epsilon)\propto \epsilon^\alpha$ (in 2 dimensions, $\alpha=0$ and in 3 dimensions, $\alpha=1/2$), then $\left\langle v^2\right\rangle=\frac{\alpha+1}{\alpha+2}v_F^2$.

At large temperature, $\beta$ is small, such that energies $\epsilon\gg\epsilon_F$ are allowed and the high energy tail of $f(\epsilon)$ is approximately a Maxwell-Boltzmann distribution $$f(\epsilon)\approx\mathrm e^{-\beta\epsilon}\qquad(T\gg\epsilon_F/k_B).$$

In the second situation, the Fermi energy does not control the distribution, the mean square velocity is $\left\langle v^2\right\rangle=\frac{\Gamma(\alpha+2)}{\Gamma(\alpha+1)}\frac{2k_BT}m$.

So the answer is that $\left\langle v^2\right\rangle$ and $v_F^2$ are related only at low temperature and the factor between them depends on the density of states.