I'm pretty sure that we can't do any better with trig identities; it looks like any such expression would have a term vaguely like $\sqrt{1 + r \cos \theta}$, and then the square root ruins the nice intuition.
In lieu of that I'll offer a derivation of the group velocity that uses no fancy math at all! Hopefully that's what you wanted, even if you didn't directly ask it.
First off, let's figure out what group velocity is. Apparently, group velocity is the velocity of the envelope of a wave. But what is the envelope of a wave? I can create a wave with any initial position and velocity, so it can be an arbitrarily weird shape. There might be no discernible envelope at all.
So let's back up and find some examples. When we talk about the envelope of a wave, we mean some curve you draw around an oscillation, like this.
In order to do this, there must be a well defined oscillation to draw the envelope around. That means that our wave must be made up of individual frequencies that are close to one central frequency. To make things convenient, let's write that schematically as
$$\text{wave} = \sum_{k'} \sin(k'x - \omega(k') t) \text{ for a bunch of } k' \approx k$$
However, if we just have one frequency, the wave is just an infinite sinusoid $\sin(kx)$. This doesn't have an envelope, strictly speaking, because it just goes on forever at the same amplitude. We must have waves of other frequencies, which will constructively and destructively interfere with each other, to actually get an envelope.
So we've concluded that the envelope is defined by where a bunch of sinusoids making up our wave constructively or destructively interfere. Their phases are, as a function of space and time,
$$\phi(k') = k'x - \omega(k') t$$
Now let's assume for simplicity that the top of the envelope is at $x = 0$ at time $t = 0$. That means that the waves must constructively interfere there, so all the $\phi(k')$ are about the same.
As time goes on, the envelope will move, but the peak will still be where the phases of the component waves are the same. That means
$$\text{peak of envelope satisfies } \frac{d\phi(k')}{dk'} = 0$$
Performing the differentiation, we have
$$x - \frac{d\omega(k')}{dk'} t = 0$$
Since we said $k' \approx k$, let's drop the primes and rearrange for
$$\frac{x}{t} = \frac{d \omega(k)}{dk}$$
But $x/t$ is exactly the speed of the peak of the envelope, so this is the group velocity.
This is the generic form of a dispersion relation with a low frequency cutoff. It is a good model for the dispersion relation of electromagnetic waves in a plasma, i.e. the ionosphere, where:
$$ \omega^2 = \omega_0^2 + c^2 k^2 $$
Which is the same as your form, though I've put back in $c$ for the nondispersive phase velocity and changed $k_0 \to \omega_0$ to make it a little clearer. For frequencies below the cutoff $\omega_0$ the wave number is imaginary, that is, the waves don't propagate, they are evanescent decaying exponentials. As you point out, the phase velocity
$$ v_\phi^2 = \frac{\omega^2}{k^2} = c^2 + \frac{\omega_0^2}{k^2} $$
can get arbitrarily large in the large wavelength limit. The group velocity
$$ v_g^2 = \left( \frac{\partial \omega}{\partial k} \right)^2 = \frac{ c^2 }{ 1 + \frac{\omega_0^2}{c^2 k^2} } $$
which vanishes in the long wavelength limit and approaches the nondispersive phase velocity in the short wavelength limit.
Notice that, for the case of plasmas, you might be even more concerned about the diverging phase velocity, since in that case it is quickly exceeding the speed of light. But, it is no bother. The phase velocity is a fairly artificial thing, it just relates the relative phase of different parts of our system after they have reached their steady state.
To get a good physical analogy to reason with, consider a line of pendula of mass $m$ hanging from the wall of length $l$, connected with springs with spring constant $K$ separated by a distance $a$, we get a dispersion relation of the form
$$ \omega^2 = \frac{g}{l} + \frac{4K}{m} \sin^2 \frac{ka}{2} $$
which if we take long wavelength limit (i.e. make $ka \ll 1$, so put our pendula close together or consider wavelengths small compared to their separation, we get
$$ \omega^2 = \frac{g}{l} + \frac{Ka}{M} k^2 $$
which is the same form as the one under consideration. But now we know how to reason about the situation. First, the presence of a low frequency cutoff makes sense. The pendula want to oscillate at their natural frequency of $\omega = \sqrt{g/l}$ and there is nothing we can do to compel them to oscillate any slower. This will be generic in any system that has some kind of "internal oscillation" aside from the coupling.
This is the large wavelength limit, in which the frequency of our wave is just the frequency of our pendula by themselves. Imagine the whole row of pendula all moving back and forth in perfect synchronization. Equivalently, you can imagine making the coupling between the pendula tend towards zero, which will put us in the same regime. Now all of the pendula move back and forth in perfect synchronization, so if you try to follow the motion of a "wave crest", it will appear to move at an arbitrarily fast speed, approaching infinity in the limit that our wavelength goes to infinity and all of the pendula are truly synchronized (they all have the same phase). Additionally, if we consider this as the limit of vanishing coupling, it makes sense that the group velocity should vanish, as the group velocity gives us the velocity at which a disturbance moves through our system. If the pendula's coupling is arbitrarily small, and you jostled one of them, that disturbance would move arbitrarily slowly to the other pendula.
The other end makes intuitive sense as well. As we increase the frequency of the wave, or lower is wavelength, more and more we are probing the physics of an infinite line of coupled masses with springs, the pendula bit doesn't enter into it, so we recover the physics of a nondispersive medium with constant phase and group velocities, which are equal. (As long as we don't try it with too small of a wavelength so as to ruin our long wavelength approximation in the first place).
Best Answer
With a continuous wave you cannot transmit a signal. For a signal to be transmitted, you need a modulation of the wave, e.g. amplitude modulation. For example, to transmit acoustic frequencies (speech), you modulate the high frequency electromagnetic carrier wave (on the order of MHz for medium wave transmitters) with the acoustic frequencies(up to 20kHz). This modulation produces small variations called side-bands (plus and minus 20kHz) in the transmitted waves. The group velocity of a wave describes the velocity with which such modulation of the carrier amplitude, which transmits the signal, propagates. In free space, the group velocity of an EM wave is identical to the phase velocity $c$ because the dispersion is linear $\omega=c k$. Thus also a pulse shaped modulation propagates with unchanged form. On transmission lines, there can be significant nonlinear dispersion, i.e. the phase velocity $v_{ph}= \frac {\omega}{k}$ for different frequencies is not constant and, in general, different from the group velocity $v_{gr}=\frac {\partial \omega}{\partial k}$. This leads to a loss of shape of a pulse-like modulation of the carrier wave. However, the propagation speed of such a pulse modulation can still be obtained from the group velocity.
That the group velocity is opposite to the phase velocity happens only in systems with special nonlinear dispersion relations.