Your following statement is almost accurate:
“I am aware of the argument saying time reversal invariant perturbation can only couple edge states pairwise, …”
The above statement would be more accurate if you replace “couple” by “annihilate.” The use of the former versus the latter has a physically distinct interpretation. If I start with a diagonal Hamiltonian (say) $H_{0}$, expressed as a matrix in the basis of an arbitrary number of (even) gapless edge modes, then the electrons in these modes do not scatter into one another if the matrix is diagonal. In other words, the modes are decoupled. Using your example of three pairs of edge states, and assuming a Dirac-like dispersion for them, we can write their energy-momentum dispersion as $$E_{n,\pm}(k)=\pm\hbar v_{F,n}|k| \; ,$$ where $n=1,2,3$ labels the pairs and $\pm$ identifies the Kramers partner. Assuming $v_{F,n}= v_{F}$ for all $n$, the Hamiltonian can be expressed in the matrix form (in the basis you defined above) as $$H_{0}(k) = \hbar v_{F}|k|\left(\begin{array}{cccccc}
1 & 0 & 0 & 0 & 0 & 0\\
0 & 1 & 0 & 0 & 0 & 0\\
0 & 0 & 1 & 0 & 0 & 0\\
0 & 0 & 0 & -1 & 0 & 0\\
0 & 0 & 0 & 0 & -1 & 0\\
0 & 0 & 0 & 0 & 0 & -1
\end{array}\right)$$
Now, if we introduce a perturbation $V$ (say from your example), and use a traffic metaphor, then electrons in the same “lane” (color-coded as red or blue) can travel in either direction. As you correctly pointed out, electrons in the $| 1, 1 \rangle$ and $| 2, 1 \rangle$ (red) lanes can take a “U-turn” by switching to the $| 3, 2 \rangle$ lane. In other words, non-magnetic impurities can backscatter right-moving electrons and vice-versa. An equivalent argument can be made for the blue lanes. These are, by construction, time-reversal symmetric processes. Here’s a quick sanity check: with no magnetic impurities (and assuming $s_{z}$ conservation) there will be no spin flips. Hence decoupling the red and blue lanes implicitly assumes time reversal symmetry. The full Hamiltonian now becomes $$ H_{1}(k) = H_{0}(k) + V \; .$$
The reconstructed edge dispersion can be obtained by diagonalizing the above Hamiltonian independently for each $k$. For analytic simplicity let’s focus on $k=0$. Besides, we are interested in the gaplessness of the edge states anyway. Note: $$\begin{align} H_{1}(0) &= H_{0}(0) + V \\ &= V \end{align} \; .$$ Diagonalizing $V$ in a straightforward manner gives doubly degenerate (i.e. total six) eigenvalues $\varepsilon = 0, \; \sqrt{2}a, \; - \sqrt{2}a$. Note that two pairs of edge states have acquired a $2\sqrt{2}a$ band gap. This can be viewed as a momentum space annihilation of a pair of edge states.
From the above example, it would seem as if only two pairs of states absorbed the scattering while leaving the third one intact. From the explicit expression for $V$, however, it is clear that scattering is occurring in all three pairs. That’s because we’re looking at the $V$ matrix in the original basis; i.e. before the perturbation was introduced. As you may already know, in band theory it is customary to label bands in a basis diagonal in $k$-space. Hence, in the new basis, scattering occurs between only two Kramers pair (gapped) bands while leaving the third (gapless) one intact. Another way to see it is this: we can view (contrary to band theory custom) the edge state bandstructure in the new basis before the perturbation is introduced. In other words, we can redefine the edge states as linear combinations of the old basis using the eigenvector components from the $V$ diagonalization. Furthermore, linear combination of a gapless basis will again be gapless (now with different $ v_{F,n}$’s). In this basis scattering will only occur between two pairs of edge states (for $|E| < |\sqrt{2}a|$) upon slowly turning on $V$.
In the Hasan-Kane paper, the authors are discussing a general theory of band topological insulators. Hence Fig. 3 could potentially represent a slice (in momentum space) of the bands $E(k_{x},k_{y},k_{z})$ along a line connecting two Time-Reversal Invariant Momentum (TRIM) points $\Gamma_{a}$ and $\Gamma_{b}$. For the case of the quantum spin Hall effect, $\Gamma_{a} = 0$ and $\Gamma_{b} = \pi$. Taking a mirror image with respect to $\Gamma_{a}=0$ and plotting in the domain $k\in[-\pi,\pi)$ you can see the Dirac cones at $k=0$ and $k=\pi$ clearly. For the Bernevig-Hughes-Zhang (BHZ) model, the 1D Dirac cone appears at $\Gamma_{a}=0$ for $0 < M/B < 4$ and at $\Gamma_{b}=\pi$ for $4 < M/B < 8$ but not both simultaneously. Please see details in the reference:
Shijun Mao, Yoshio Kuramoto, Ken-Ichiro Imura, and Ai Yamakage. “Analytic theory of edge modes in topological insulators.” Journal of the Physical Society of Japan 79, no. 12 (2010). (arXiv)
Note: they use $\Delta/B$ instead of $M/B$. To get multiple Dirac points, as in Fig. 3, we need a mathematically more complex model than the BHZ.
Best Answer
The answer of David Aasen is correct, but let me add some comments which connect to your question of the relation of between the $\mathbb Z_2$ invariant $\nu$ and the first Chern-Number $C_1$.
Such a relation does not exist unless you require some extra symmetry than the generic symmetries usually required in the classification of topological insulators (such as time-reversal invariance in this case). Say the Hamiltonian is invariant under spin rotations along the $z$-axis (so a $U(1)$ subgroup of $SU(2)$ in left invariant), then the Hamiltonian can be block-diagonalized as
$H = \begin{pmatrix} H_\uparrow & \\ & H_\downarrow \end{pmatrix}, $
where the indices refer to spin-up and down degrees of freedom. Due to time reversal symmetry we have that $H_\downarrow(k) = H^*_\uparrow(-k)$. The system now consist of two copies of Quantum Hall effects with counter propagating edge states of opposite spin. As Davis Aasen says, the chern number is zero $C_1 = C_1^\uparrow + C_1^\downarrow = 0$. The difference however, the "spin Chern number", $C_1^\uparrow - C_1^\downarrow = 2C_{spin}$ can be non-zero and can be calculated by the Chern-numbers of the spin up/down sectors. As long as $S_z$ is preserved the spin Chern-number can be any integer $C_{spin}\in\mathbb Z$.
But if we add off-diagonal elements, and thus break the rotation symmetry along $z$, the invariant breaks down to $\nu = C_{spin}\,\text{mod}\,2\in\mathbb Z_2$ (as was shown by Kane and Mele). So topological trivial/non-trivial phases are characterized by even and odd spin-Chern numbers $C_{spin}$, not the original Chern number $C_1$. This however only makes sense when you have this extra symmetry.