Shine a flashlight on a wall. Rotate the flashlight so the illuminated spot moves.
Q: How fast does the spot move?
A: It depends how far away the wall is.
Q: How fast can the spot possibly move?
A: There is no limit. Put the wall far enough away, and the spot can move with any speed.
Q: What is moving across the wall?
A: Nothing. The light that makes up the spot at one instant is unrelated to the light that makes up the spot an instant later.
This is how a wave can be apparently superluminal: we interpret a series of unrelated events as a continuous 'wave'. Group velocity can also be superluminal; even though the individual chunks of energy are going at roughly $c$, the region where they superpose constructively (the 'crest of the wave') goes faster than $c$.
How do we write mathematically that "information" cannot go faster
than light? And along a similar line of thought, how do we relate
"information" with special relativity.
Since you are looking for an equation (you say "mathematically"), I would undoubtedly choose this:
$$\left[\hat O (x),\, \hat O' (y)\right]=0, \, \mbox{if}\; x-y \; \mbox{is spacelike}$$
where $\hat O$ and $\hat O'$ are the (linear self-adjoint) operators corresponding to two physical observables —in particular, both may be the same observable and therefore the same operator ($\hat O=\hat O'$)—, and $x, y$ are two points in space-time. This equation summarizes the fact that information cannot travel faster than light because it says that the results of two experiments separated by a space-like interval cannot be correlated. And this is what "information" means since one codes information with physical effects. Please, see this Definitions: 'locality' vs 'causality' if you are interested in the different usages of the terms "causality" and "locality", which are physically more relevant or why entanglement do not imply faster than light propagation.
The previous formula assumes that the physical laws obey the principles of quantum mechanics and special relativity, and are thus quantum field theories. This is the case for the electromagnetic, the weak and the strong interactions and also likely for the case of the gravitational interaction in the weak field limit and in the sense of an effective field theory; which are the fundamental interactions that we know.
Lastly, what is the relationship between Special Relativity and the
fact that the phase velocity of a wave packet can go faster than light
(light speed here being the group velocity)
Sometimes defining the speed of a wave is tricky. The signal or information velocity is often the group velocity (which is the velocity of a wave packet), even though in some media (see http://en.wikipedia.org/wiki/Signal_velocity) it is not. But the phase velocity (the rate at which the phase of the wave propagates) cannot carry information (see http://en.wikipedia.org/wiki/Phase_velocity) and may be faster than $c$.
Is there a reason we cannot consider the frame of reference of a
specific phase in a wave packet?
You may take any inertial frame provided its speed be lower than $c$. Note that according to special relativity one needs an infinite amount of energy to cross the speed of light $c$ threshold.
Edit: SMeznaric points out —and I agree with him— that space-like separated measurements may give correlated results. What is not possible is to send information one has control over, such as the choice of measurement operators.
Best Answer
The simple answer is that the wave packet travels at the group velocity not the phase velocity, and the group velocity is always less than or equal to $c$.
You might argue that you aren't using a wave packet. For example you might argue that you are just turning the light on and waiting for it to get to the point $B$. However any modulation of the wave intensity, including turning it on and off, will propagate at the group velocity.