[Math] Finding the information sets of a linear code.

coding-theorylinear algebra

I'm trying to get a better understanding of linear codes, so I decided to work on problems from various textbooks. I'm having trouble understanding how to do this problem, and I was wondering if anyone can lead me in the right direction.

Problem The matrix $G = [I_{4} | A]$, where
$$ G = \left[ \begin{array}{cccc|ccc} 1 & 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1\end{array} \right] $$

is a genrator matrix in standard form for a $[7,4]$ binary code, denoted by $\mathcal{H}_3$. The parity check matrix for $\mathcal{H}_3$ is
$$H = [A^{T} | I_{3}] = \left[ \begin{array}{cccc|ccc} 0 & 1 & 1 & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 & 0 & 0 & 1 \end{array} \right]. $$

Find at least four information sets in $\mathcal{H}_3$. Find at least one set of four coordinates that do not form an information set. $\blacksquare$

The book defines an information set as follows: Given a $[n,k]$ linear code $\mathcal{C}$, a generator matrix for $\mathcal{C}$ is any $k \times n$ matrix $G$ whose rows form a basis for $\mathcal{C}$. For any set of $k$ independent columns of $G$, the corresponding set of coordinates forms an information set for $\mathcal{C}$.

Any help would be greatly appreciated since I've been staring at this for quite some time. Thanks!

Best Answer

As I interpret your book's definition of an "information set", the "set of coordinates" that corresponds to a set of columns is literally the index of those columns in the matrix $G$. So if $$G=\left[\begin{array}{cccc|ccc} 1 & 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{array}\right],$$ the "set of coordinates" corresponding to "the first, second, fourth, and seventh columns" is literally just (1,2,4,7). Since these columns are not linearly independent, this would be an example of a set of four coordinates which do not form an information set.

That $G$ has four independent columns is what allows it to uniquely encode a message of length four. If you take any four-digit message $m$ and multiply $mG$ then the resulting 7-digit codeword can be translated uniquely back to the original message; if $G$ had fewer than $4$ independent columns, this multiplication would yield the same result for more than one 4-digit message, and in a sense you would lose information. You could never be sure you decoded certain messages correctly.

Another way to think of the situation is that because $G$ has four independent columns, it has rank 4 and thus when you put it in row-reduced echelon form, it will be a standard generator matrix (the $G$ in this example already is a standard generator matrix). If it wasn't, though, and yet it still had four independent columns, you could convert it easily. Now, if you look at what happens when you multiply $mG$, the result is the message $m$ itself with three additional digits tacked onto the end. If the columns of the identity matrix were at other coordinates, the codeword would have the message $m$ rearranged as those columns are rearranged, with additional digits inserted where the other columns are.

I hope this rambling was useful rather than even-more confusing. This document has a pretty good explanation of information sets as well (see p. 39). (I'm not sure who to credit, but the document originally came from Jonathan Hall's MSU webpage)

Related Question