[Math] Prove that a code is linear

coding-theory

Show that the repetition code of order r (i.e. each bit of the original word is sent r times) is a linear code. Determine a generating matrix and a check matrix of this code.

So we have a $(r*k,k)$ linear code. The generating matrix is trivial. For example:

$G_{k=2} = \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \end{array}$

But how exactly do I prove that this code is linear? Is it enough to say that there always exists a generating matrix?

Best Answer

It's sufficient to find a generating matrix.

A linear code is usually defined as a subspace of $F^n$ for some field $F$ (since you're talking about bits, you can take $F = \mathbb{F}_2 = \{0,1\}$). The code $C$ generated by a generating matrix $G$ is the span of the rows of $G$. The span of a set of vectors in $F^n$ is a subspace of $F^n$, so $C$ is a linear code.

Related Solutions

[Math] Finding the information sets of a linear code.

As I interpret your book's definition of an "information set", the "set of coordinates" that corresponds to a set of columns is literally the index of those columns in the matrix $G$. So if $$G=\left[\begin{array}{cccc|ccc} 1 & 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{array}\right],$$ the "set of coordinates" corresponding to "the first, second, fourth, and seventh columns" is literally just (1,2,4,7). Since these columns are not linearly independent, this would be an example of a set of four coordinates which do not form an information set.

That $G$ has four independent columns is what allows it to uniquely encode a message of length four. If you take any four-digit message $m$ and multiply $mG$ then the resulting 7-digit codeword can be translated uniquely back to the original message; if $G$ had fewer than $4$ independent columns, this multiplication would yield the same result for more than one 4-digit message, and in a sense you would lose information. You could never be sure you decoded certain messages correctly.

Another way to think of the situation is that because $G$ has four independent columns, it has rank 4 and thus when you put it in row-reduced echelon form, it will be a standard generator matrix (the $G$ in this example already is a standard generator matrix). If it wasn't, though, and yet it still had four independent columns, you could convert it easily. Now, if you look at what happens when you multiply $mG$, the result is the message $m$ itself with three additional digits tacked onto the end. If the columns of the identity matrix were at other coordinates, the codeword would have the message $m$ rearranged as those columns are rearranged, with additional digits inserted where the other columns are.

I hope this rambling was useful rather than even-more confusing. This document has a pretty good explanation of information sets as well (see p. 39). (I'm not sure who to credit, but the document originally came from Jonathan Hall's MSU webpage)

[Math] 2 by 2 parity check matrix and 3 by 2 rectangular code.

This question uses many names in a nonstandard way (at least as far as the coding theory literature is English is concerned), and this confuses the issues.

The codes in question are single-error-correcting codes but are not Hamming codes in general. By Hamming code, one usually means a $[2^m-1,2^m-1-m]$ linear code with minimum distance $3$. It is a single-error-correcting code, but not all single-error-correcting codes are Hamming codes. Some might be shortened Hamming codes (set some data bits to $0$ and don't transmit them at all) but even this is not strictly necessary.

So, what are these "rectangular" codes? They are punctured product codes, where by a product code $\mathcal C_1\times \mathcal C_2$ is meant a rectangular $k_2\times k_1$ array of data bits bordered by parity bits such that each row is a codeword in $\mathcal C_1$ and each column is a codeword in $\mathcal C_2$ where, of course, $\dim(\mathcal C_i) = k_i$. The minimum distance of $\mathcal C_1\times \mathcal C_2$ is the product of the minimum distances of $\mathcal C_1$ and $\mathcal C_1$. If $\mathcal C_1$ and $\mathcal C_2$ are $[k_i+1,k_i]$ single parity check codes, then the minimum distance of $\mathcal C_1\times \mathcal C_2$ is $4$. But, the codes used by the OP are punctured product codes since the lower right-hand corner of the array (the parity of the parity bits!) has been deleted, and the minimum distance reduces by $1$.

Codewords in a product code can be regarded as a single vector of length $n_1n_2$ (just concatenate the rows into a single codeword) and so $\mathcal C_1\times \mathcal C_2$ is a $[n_1n_2,k_1k_2, d_1d_2]$ linear code. The codes called Hamming codes by the OP's instructor are thus $[n_1n_2-1,k_1k_2, 3]$ single-error-correcting codes, but they are not Hamming codes. Note for example that with $k_1=k_2=2$, the construction gives a $[8,4,3]$ code (punctured from a $[9,4,4]$ single-error-correcting, double-error-detecting code), and this is not the same as the $[7,4,3]$ Hamming code that all of us know and love. Of course, error correction is easy with this code. If one row parity-check sum fails as does one column parity-check sum, the error is in the data bit in that row and column. If only a row sum fails or only a column sum fails, the parity bit in the failed sum was received incorrectly.

-------

Sheesh! OK, $$\begin{align}c_1&=u_1\\c_2&=u_2\\c_3&=u_1+u_2\\c_4&=u_3\\c_5&=u_4\\c_6&=u_3+u_4\\c_7&=u_1+u_3\\c_8&=u_2+u_4\end{align}$$ giving $$\left[\begin{matrix}c_1\\c_2\\c_3\\c_4\\c_5\\c_6\\c_7\\c_8\end{matrix}\right]=\left[\begin{matrix}1&0&0&0\\0&1&0&0\\1&1&0&0\\0&0&1&0\\0&0&0&1\\1&0&1&0\\0&1&0&1\end{matrix}\right]\left[\begin{matrix}u_1\\u_2\\u_2\\u_4\end{matrix}\right]$$ Can you figure out $\mathbf G$ now? It will not be of the form $[I\mid P]$.

Best Answer

Related Solutions

[Math] Finding the information sets of a linear code.

[Math] 2 by 2 parity check matrix and 3 by 2 rectangular code.

-------

Related Question