In calculating the electron dispersion you probably obtained the diagonalized Hamiltonian in the momentum space
$$
H=\sum_\mathbf{k}\left[c^{\dagger}_{\mathbf{k}A},c^{\dagger}_{\mathbf{k}B}\right]\left[\begin{array}{cc}0 & \Delta(\mathbf{k})\\ \Delta^{\dagger}(\mathbf{k}) &0\end{array}\right]\left[\begin{array}{c}c_{\mathbf{k}A} \\ c_{\mathbf{k}B}\end{array}\right].
$$
If you you chose your $x$ axis along the zigzag direction (arXiv:1004.3396), the two nonequivalent Dirac valleys are $\mathbf{K}_\kappa=\left(\kappa\frac{4\pi}{3\sqrt{3}a},0\right)$, $\kappa=\pm1$ and $\mathbf{K}_{-1}=\mathbf{K}^{\prime}$, where $a$ is the C-C distance. Then $\Delta(\mathbf{k})=-t\left(1+e^{-i\mathbf{k}\cdot\mathbf{a}_1}+e^{-i\mathbf{k}\cdot\mathbf{a}_2}\right)$, where $t$ is the hopping term, and $\mathbf{a}_1=\left(\sqrt{3}a/2,3a/2\right)$ and $\mathbf{a}_2=\left(-\sqrt{3}a/2,3a/2\right)$ are the lattice vectors.
Taylor expanding $\Delta(\mathbf{k})$ up to linear terms around those two points you obtain
$$
\Delta(\mathbf{k})=\kappa\frac{3ta}{2}q_x-i\frac{3ta}{2}q_y
$$
where $\mathbf{q}$ is the displacement momenta from the $\mathbf{K}_\kappa$ point. Promoting these displacement momenta to operators you obtain the Hamiltonian
$$
H=\hbar v_F\left[\begin{array}{cccc}0 & q_x-iq_y & 0 & 0\\q_x+iq_y & 0 & 0 & 0\\0 & 0 & 0 & -q_x-iq_y\\0 & 0 & -q_x+iq_y & 0\end{array}\right]
$$
where $v_F=\frac{3ta}{2\hbar}$ is the Fermi velocity. This is in $\left[\Psi_{A\mathbf{K}},\Psi_{B\mathbf{K}},\Psi_{A\mathbf{K}^{\prime}},\Psi_{B\mathbf{K}^{\prime}}\right]^T$ basis, if you rearrange your basis as $\left[\Psi_{A\mathbf{K}},\Psi_{B\mathbf{K}},\Psi_{B\mathbf{K}^{\prime}},\Psi_{A\mathbf{K}^{\prime}}\right]^T$ you get the compact form
$$
H=\hbar v_F\tau_z\otimes\boldsymbol{\sigma}\cdot\mathbf{k}
$$
where $\tau_z$ acts in the valley space. This is similar to the Dirac-Weyl equation for relativistic massless particles, where instead of $v_F$ you get the speed of light
$$
H=\pm\hbar c\boldsymbol{\sigma}\cdot\mathbf{k}
$$
where $+$ denotes right-handed antineutrions, and $-$ denotes left-handed neutrions. The differences are that $\boldsymbol{\sigma}=\left(\sigma_x,\sigma_y\right)$ for graphene acts in pseudospin space and $\boldsymbol{\sigma}=\left(\sigma_x,\sigma_y,\sigma_z\right)$ for neutrinos acts in real spin space.
First of all, $E=mc^2$ is rest energy, not kinetic. Relativistic kinetic energy is given by
$$E_K=(\gamma-1)mc^2,$$
where $$\gamma=\frac1{\sqrt{1-v^2/c^2}}.$$
Now, for a particle moving at speed of $10^7 m/s$, we have $\gamma\approx1.00504$. That's pretty close to 1, so non-relativistic approximation of $E_K=\frac12mv^2$ works quite nicely.
In $(3.2(d))$ the expression is given for free electron, so $m$ is its mass as of electron in vacuum.
In $(3.3)$ the expression defines the effective mass of crystal electron in terms of its dispersion relation in given band. In this case, of course, $m^*$ is indeed different from $m$ in general, though for free electron (i.e. crystal with constant potential everywhere) we'd get $m^*=m$.
Now to forces and negative effective mass. Electron in crystal doesn't have the usual parabolic dispersion relation as you know for vacuum electron. In some hypothetic 1D crystal with lattice constant $a=2$ dispersion relation for lowest band could look like
$$E_{hyp}=\sin^2(k).$$
Effective mass in such a band will be
$$m^*_{hyp}=1\left/\frac{d^2E}{dk^2}\right.=\frac12\sec(2k).\tag1$$
You can see that here effective mass at center of Brillouin zone is positive, at borders it's negative, and at $k=\pm\frac\pi4$ it has poles. This means that, in particular, linearly changing potential can't give electron too high energy no matter how long you wait.
Now suppose initially you have electron in the state at the border of Brillouin zone in this band. Its effective mass is negative. Let's do as you say, apply a force $F=m^*a$ to this electron and see what classical-mechanical calculations will lead to. As second Newton law says,
$$\frac{dp}{dt}=F.$$
Electron quasimomentum is related to quasiwavenumber as $p=\hbar k$. Taking units where $\hbar=1$, we have
$$\frac{dk}{dt}=F.$$
Now let's apply $F=m^*a$ to this equation:
$$\frac{dk}{dt}=m^*a.\tag2$$
Substituting $m^*_{hyp}$ from $(1)$, we get:
$$\frac{dk}{dt}=\frac{a}2\sec(2k).\tag3$$
Now, $a$ could mean one of the following: group acceleration or phase acceleration, i.e. derivative of group velocity or phase velocity. What is important is that it depends on quasiwavevector: $a(k)$.
For group velocity substituting $a(k)=\frac{\text{d}}{\text{d}t}v_g(k)$ with $v_g(k)=E_{hyp}'(k)$ leads to tautology, meaning that the equation is correct for group velocity and any $F$, making our analogy with classical mechanics sensible.
Let's now apply a constant force $F$ at our particle, initially being in a state described by some wave packet. The solution to Newton's equation would be
$$k(t)=Ft+k_0.$$
We can see that the particle is moving with ever increasing quasimomentum. But quasimomentum is defined up to translation by an arbitrary vector of reciprocal lattice. Thus, as $k$ crosses the boundary of Brillouin zone, we could subtract $\frac{2\pi}a$ from it, and see that it appears at the opposite side of Brillouin zone, and the group velocity changes sign.
Its energy $E(k)$ also depends on time periodically, and the same is for group velocity and expectation value of position. Thus what we have is oscillatory motion of the our wave packet.
Best Answer
What is really effective mass?
Effective mass emerges as a result of expanding the energy dispersion near its minimum/maximum, where it is correspondingly positive/negative.
Energy spectrum of a crystalline solid consists of energy bands of finite width, described by an energy dispersion relation $\epsilon_n(\mathbf{k})$, where $n$ is the band index and $\hbar\mathbf{k}$ is quasi-momentum - this is not the real momentum of an electron, but a quantum number entering the Bloch theorem.
Let us take for simplicity a one-dimensional band with dispersion $$\epsilon(k) = \Delta\cos(ka).$$ This band has minima at $k=\pm\pi/a$ and maximum at $k=0$, and its width is $2\Delta$. If we expand this energy-quasi-momentum relation at $k=0$, we obtain $$\epsilon(k)\approx\Delta -\frac{\Delta a^2 k^2}{2} = \Delta + \frac{\hbar^2k^2}{2m^*},$$ where the effective mass is defined as $$m^*=-\frac{\hbar^2}{\Delta a^2}.$$ The effective mass is introduced by analogy with the free-electron dispersion relation $$\epsilon(k) = \frac{p^2}{2m*} = \frac{\hbar^2k^2}{2m},$$ and simplifies calculations, when the electrons are indeed close to the band extrema.
If instead we wanted to expand the dispersion relation near its minimum, we could write $k=\pm\pi/a + q$, and obtain $$\epsilon(k) = \Delta\cos(\pm\pi + qa) = -\Delta\cos(qa) \approx\Delta +\frac{\Delta a^2 q^2}{2} = \Delta + \frac{\hbar^2q^2}{2m^*}.$$
For a real semiconductor we are usually interested in the phenomena happening near the maximum of the valence band, which is filled with electrons up to the top, and the bottom of the conduction band, which is empty. Therefore the effective mass in the conduction band is positive, whereas in the valence band it is negative. Since in real materials the energy bands have complicated form, we often have to deal with the effective mass tensor, resulting from the expansion of the three-dimensional energy-momentum relation: $$\epsilon(\mathbf{k}) \approx \epsilon(0) + \frac{1}{2}\sum_{i,j}\frac{\partial^2\epsilon(\mathbf{k})}{\partial k_i\partial k_j}|_{\mathbf{k}=0}k_i k_j = \epsilon(0) + \sum_{i,j}\frac{\hbar^2k_ik_j}{2m_{ij}^*},\\ \frac{1}{m_{ij}^*} = \frac{1}{\hbar^2}\frac{\partial^2\epsilon(\mathbf{k})}{\partial k_i\partial k_j}|_{\mathbf{k}=0} $$ (more precisely, it is the inverse effective mass that has tensor properties.) Moreover, in a real material the bottom of the conduction band and the top of the valence band do not necessarily occur at the same point in k-space.
Effective mass near the edges of the Brillouin zone
Finally, when we go away from the band extremum, the expansion is not valid anymore. However, the effective mass does not diverge at $k=\pm\frac{\pi}{2a}$, since it is not a function of $k$, but the value of the derivative at a particular point(i.e., a band extremum): $$m^*=\hbar^2\left(\frac{d^2E(k)}{dk^2}\right)|_{k=0},$$ it is NOT $$m^*(k)=\hbar^2\left(\frac{d^2E(k)}{dk^2}\right).$$
Holes vs. electrons with negative effective mass
Holes are vacancies in the valence band, obtained by removing a few electrons at its top. All the electrons at the top of the valence band have negative effective mass, so holes are more than just electrons with a negative effective mass. In fact, holes are rather complex many-particle excitation.