Error-correcting code for hexadecimal symbol

coding-theory

Hexadecimal (4-bits) symbols (data word) are transmitted through an error-prone channel. The expected number of symbols is known.

The hamming code is not suitable in this scenario because the corruption is always done on all 4 bits that make up the symbol.

The Reed-Solomon code RS(18, 10) shortened from an RS(255,223) is currently used but the symbol is 8 bits and not 4 bits.

Which error correction code would be appropriate?

Best Answer

The choice most likely depends not only on the expected number of symbol errors but rather on the distribution of those errors. You may need to run a simulation to reach a conclusion (or a recommendation to your boss, whether relevant). The goal is undoubtedly to meet some quality-of-service goal that you didn't specify (and may need feedback from the said boss to determine anyway).

8-bit symbols: Here there is no need to look further than shortened Reed-Solomon codes. Relevant features:

  • The codes are MDS, meaning that you need exactly two check symbols to correct a single erroneous symbol.
  • You will pack two 4-bit symbols into a single 8-bit symbol, effectively meaning that the number of hexadecimals that go into a single block is twice the number of code symbols.
  • But, for the purposes of error correction it is immaterial whether only one 4-bit part or both 4-bit parts of an 8-bit symbol are in error. It is always a single symbol error when working with 8-bit symbols. This means that to correct a single hexadecimal error, you need to use up 16 check bits (two symbols worth). And to correct $t$ hexadecimals in error you need up to $16t$ check bits.
  • But, occasionally two hexadecimals in error occupy the same 8-bit symbol of the code, and that counts as a single error for the purposes of error correction. Therefore the number of hexadecimals in error is insufficient to determine whether a given error pattern can be corrected (I did tell that you need to run a sim!).
  • With 8-bit Reed-Solomon codes your block length won't be constrained really. You have room for up to 255 payload and check symbols. That is, up to 510 hexadecimals. You made it sound like this is plenty :-)

4-bit symbols: Here I want to suggest a few alternatives. Basically because you left it unclear how long your data blocks are, and whether you can split it into several blocks of an error-correcting code on demand or not.

The goto code is still the Reed-Solomon code (suggested in Dilip Sarwate's comment):

  • The code is MDS, meaning that you need exactly two check symbols to correct a single symbol error.
  • So to correct a single hexadecimal in error, you need two hexadecimal check symbols. Or, to correct $t$ erroneous hexadecimal symbols you need $8t$ check bits. Sounds better than the $16t$ check bits above, but...
  • The block length is limited to $16$ hexadecimals, total. That is, a single block of a code correcting a single hexadecimal error can contain at most 14 hexadecimals worth of payload data, if you want the code to handle the possibility of two erroneous hexadecimals, you are limited to 12 hexadecimals per block et cetera.

The point I want to emphasize is that this Reed-Solomon code will constrain the block length severely. That may or may not be a problem, because you can split a longer block of data into several code blocks. But, for the purposes of meeting a QoS requirement you may (?) then need to estimate the probability of ALL those blocks being able to correct the eventual errors. Therefore I also proffer a few alternatives, just in case their parameters work better in your application (run that sim!). Those alternatives are so called algebraic geometry codes. The amount of math that goes into their theory is significantly more than what is needed for Reed-Solomon codes, but I still suspect you could locate code for using those codes by searching... The advantage they offer over Reed-Solomon codes is that they have longer data blocks. That sometimes helps, but the catch is that they are not MDS-codes. Meaning that to correct $t$ erroneous hexadecimals they need more than $2t$ check symbols. There is an extra parameter called the genus, $g$. When using a code based on a curve of genus $g$, the code needs $2t+g$ check symbols to correct $t$ errors. That is, $8t+4g$ check bits. A few alternatives:

  • We get $g=1$ for example from the elliptic curve $y^2+y=x^3+x+1$ over the field $GF(16)$. The advantage the resulting codes have over Reed-Solomon is that now there is room for $24$ hexadecimals. So if you want to protect a group of $21$ hexadecimals against a single error, they will fit into a single block of this code, but you do need those $2t+g=2\cdot1+1=3$ check symbols instead of $2$ check symbols that sufficed with a Reed-Solomon code. Remember that if you split those $21$ hexadecimals into two blocks of a Reed-Solomon code, BOTH those blocks will need two check symbols, because you won't know in advance which half will be hit with the error.
  • For longer blocks of hexadecimals you can use other curves. A fairly well-studied case is the Hermitian curve over $GF(16)$ defined by the equation $y^4+y=x^5$. This curve has genus $g=6$, so $4\cdot(2t+6)=8t+24$ check bits to correct $t$ erroneous hexadecimals. Their advantage is that this time the block length goes up to $64$ hexadecimals. Playing a similar number game as above, we see that we can protect a group of $40$ hexadecimals against nine errors with a singled block of a Hermitian code of $64$ hex symbols. We could split those $40$ symbols into four Reed-Solomon blocks carrying $10$ symbols each. The Reed-Solomon codes then protect those blocks against $3$ errors per subblock. It depends on the distribution of the errors, whether it is more likely that the number of errors seen by the block of $64$ exceeds nine, or whether any single one of the subblocks sees more than three errors.

I hope the example number games I described above give you an idea, what you need to know to decide on a code. It is difficult to tell what is best for your application. It may well be that you don't need to optimize it too much, in which case the alternatives don't offer significant enough gains over the familiar 8-bit Reed-Solomon code. If your channel creates errors lumped together rather than scattered hexadecimal errors, you hopefully got the idea that codes with longer blocks may then save the day. An alternative often used in telcomm is to use an interleaver to split errors lumped together between various blocks of the good ole Reed-Solomon code. It seems to me that your application has so short packages that interleavers may not help. Over to you.


A further point:

It is known that a curve of genus $g$ can have at most $n=q+2g\sqrt{q}$ points over the field of $q$ elements (save for one point we need to leave out). Here $q=16$, so the blocks you get with a curve of genus $g$ can contain up $16+8g$ hexadecimal symbols, and needs to use $2t+g$ check symbols to correct $t$ errors. You see that the two curves above with $g=1$ and $g=6$ reach the maximum $16+8g$. Unfortunately that maximal length cannot be achieved for all possible values of $g$. If you need a value of $g$ other $0$ (Reed-Solomon), $1$ (elliptic) or $6$ (Hermitian), do ask, but I won't know the answer right away :-/

Related Question