I think the analogy you describe cannot be made precise with our current technology. For example, the word "functor" doesn't seem to have made an appearance yet in this context.
If you have a code, there are methods to construct lattices using it, but some of the constructions (like the Leech lattice) require special properties of the code. If you have a lattice, there are methods to construct vertex operator algebras using it (e.g., the lattice VOA), but some of the constructions, like orbifolds, depend on properties of the lattice, like the existence of automorphisms of certain orders. In the case of the moonshine module, we need the $-1$ automorphism of Leech, which isn't particularly special for lattices. Conjecturally (see work of Dong, Mason, and Montague 1994-95), we could use any fixed-point free automorphism of Leech to get an isomorphic VOA, and that is somewhat more special.
One class of "baby examples" that arise is when you take the root lattice of a simple (or more generally, reductive) algebraic group. This lattice has an action of the Weyl group. The lattice vertex operator algebra of this lattice has an action of the corresponding Kac-Moody Lie algebra (more generally, I think the centrally extended loop group acts). This is one of the more natural ways to construct $E_8$ from its lattice.
I'm afraid I'm not qualified to describe good baby examples of the transition from codes to lattices.
Your list certainly has many nice topics.
1) Yup. This would be nice to have. In practical applications we can get rid of the error-floor by concatenating a decent LDPC with a good high rate algebraic code such as a BCH-code that can then correct the residual errors (the one application I know about is the second generation standard for European digital video broadcast, aka digi-TV, their the code length is 64800 or 16200 bits). What makes this challenging is that designing a good LDPC-code requires familiarity with some tools from stochastics (lost me at that point), but those tools don't say anything about the minimum Hamming distance.
Many a standard (IIRC in addition to European DVB also MediaFlo, a US standard for something similar) uses families of LDPC-codes designed around a specific decoding circuitry.
This is more or less necessary, because otherwise the problem of routing the messages generated by the belief propagation algorithm becomes prohibitive. An exception to this rule is the Chinese video broadcast standard. At least the parts of that standard that I have seen describe the LDPC-codes in such a way that no
structure is apparent. They may be protecting their intellectual property :-)
So a breakthru in this area would probably have to also keep this in mind in order to end up in future applications.
Hopefully more knowledgable people can comment. I do expect something to happen here in years to come, but the existing LDPC codes already work quite well.
2) This was a relatively hot topic a few years. I am a bit hesitant to call it coding theory - calling it multiantenna signal constellation design might be more fitting, but whatever :-).
By using basic facts of global class field theory my graduate students managed to "improve" upon the Golden code (by Oggier et al). I put the "improve" in quotes, because the improvement is somewhat theoretical. A more precise way of stating their result is that if you carve a given number of multiantenna signals from their lattice (representing a maximal order of a division algebra), you are less likely to make an error at the receiving end than what would happen, if you carve your signal set from one of the codes proposed by Oggier et al. However, that's not the end of the story. If you combine that multiantenna constellation with, for example, an LDPC code, our construction loses its theoretical advantage, because an LDPC-decoder wants to have reliability information about individual bits. When you pack several bits worth of information into a selection of a single multiantenna signal, our method creates more dependencies among those reliability figures, and that makes things worse in the end. Anyway, the math in the construction of my students is fun, and they all graduated, so...
As the number of antennas increases, the computational complexity grows really badly.
Some codes suffer more from this than others.
A relatively recent idea (B.S. Rajan and his students, couldn't find a proper reference, sorry) is to use representations of Clifford algebras with a view of reducing this complexity. That is a promising idea.
All of the above constructions depend on the receiver knowing the channel state.
From some point on you need to allocate too large fraction of the bandwidth to pilot symbols to make that assumption true. So another thread in this area has been to
use differential modulation (=use the preceding signal as a pilot for the next) or Grassmannian codes (=the signal is to an extent its own pilot). A lot of fun math going on there, but don't know whether they will stay.
Another theoretically interesting thread within this topic is: "How will the rules change, when two or more independent users transmit simultaneously?" A beginning graduate student in our group has come up with some number theoretic constructions. As a new tool he needed some facts from Diophantine approximation. The information theory in that thread is,
I'm sad to say, over my head.
I am willing to also bet that this question on MO derives its motivation from this problem area :-)
This question hit too close. It is not entirely clear that I managed to be objective.
The Golden Code (Oggier et al) is in a hyperWLAN standard. Don't know how widely that part of the standard is used. Multiantenna coding in cellular applications goes largely by different rules. This is because there is a feedback channel there, so the transmitter also has an idea of the channel state, and can take advantage. The math becomes easier then (so I've been told). This is not possible in a broadcast application, because there may be millions of receivers, and knowing their channel states is A) impossible, B) useless because you can't optimize the transmission for all of them simultaneously.
3) The Polar codes were a big surprise to me. I can't comment on them for lack of familiarity. Leave this for someone else to answer.
4) The Golay codes have been around. They are a rich source of algebraic and combinatorial miracles - a lot of fun! The codes are way too short to be useful in transmitting bulk data, but do make an appearance in other applications. In their book (SPLAG) Conway & Sloane study these in great detail. Probably the most investigated error-correcting codes of all times!
5) I want to add network coding as a hot topic. It has certainly received a lot of attention lately. It is not clear how deep math they end up using. Sometimes it looks like it is just Grassmannians over a finite field.
Best Answer
The commenters got it right. The automorphism group of a binary code is the set of permutations of coordinates that stabilizes the code. If instead of using the binary alphabet we use a ternary, quaternary,... alphabet, then there is some variation in that some sources allows signed permutations of coordinates. After all, these are also automorphisms that preserve the Hamming distance (or Lee distance) between a pair of inputs. Whichever way gives you a more interesting group is the way to go!
In addition to the celebrated Golay codes (binary and ternary) some other families of codes have a useful group of automorphisms. I will list the Reed-Muller codes. These codes have length $n=2^m$, and the code $RM(r,m)$ has dimension $$ k=\sum_{i=0}^r{m\choose i}.$$ Their coordinate positions can naturally be put into a bijective correspondence with the vectors $v$ of the $m$-dimensional space $F_2^m$ over the field of two elements in such a way that all the the affine linear transformations $f_{A,u}:v\mapsto Av+u$ for all $A\in GL_m(F_2), u,v\in F_2^m$ become automorphisms of the codes. The Hamming codes belong to the hierarchy of Reed-Muller codes --- the extended Hamming code of length $2^m$ is the code $R(m-2,m)$. It is known (see MacWilliams & Sloane) that this is the full automorphism group of the code $RM(r,m)$, when $r\lt m-1$. The codes $RM(m,m)$ (resp. $RM(m-1,m)$) consists of all the binary words of length $2^m$ (resp. of all the binary words of an even weight), and both these codes are stable under all the permutations of the $2^m$ bit positions.
Also the extended BCH-codes have a useful group of automorphisms. In this case the bit positions can be put into a bijective correspondence between the elements $x$ of the finite field $F=GF(2^m)$. The codes can be defined by means of power sum equations, and it is easy to see that the affine linear mappings $x\mapsto ax+b, a\in F^*, b\in F$ are all automorphisms. Together with Frobenius automorphisms, $x\mapsto x^{2^i}$, those will form the entire group of automorphisms in many a case, but unfortunately I'm not up to speed about exactly when that happens. This is a much smaller group of automorphisms in comparison to that of the Reed-Muller codes. Yet, it is doubly transitive, and many an algebraic proof has been simplified by this fact.