[Math] How to efficiently represent and manipulate polynomials in software

I've started to work on a package (written in matlab for now) that among other things must be able to represent and manipulate (compare, add, multiply, differentiate, etc) polynomials in several variables. Up until now my approach has been the following

I have a class each object of which represents a polynomial.
In each polynomial object, I store the symbols of the variables that appear in the polynomial (say $x$ and $y$), the number of components each of these variables have (say $3$ and $2$ if $x\in\mathbb{R}^3$ and $y\in\mathbb{R}^2$).
I also have a fixed way (some function $r$) of generating all multi-indexes of some dimension and some order. For example, I tell this function that the dimension of the underlying space is $2$ and the order of the monomials I'm interested in is $3$ and it returns

$$r(2,3)=\left(\begin{bmatrix}3\\ 0\end{bmatrix},\begin{bmatrix}2\\ 1\end{bmatrix},\begin{bmatrix}1\\ 2\end{bmatrix},\begin{bmatrix}0\\ 3\end{bmatrix}\right)$$

Using $r$ I can systematically construct on the go the monomial basis for the space of polynomials of some degree in some variables (just identify the monomial $x_1^3$ with $[3,0]^T$, etc). Furthermore, once I fix how I order the variables (for example, first $x$ then $y$), and the number of components each has, then the order in which $r$ spits out the basis vectors is fixed.
I use the order in which $r$ returns out the vectors to represent the polynomial of interest as the vector of coefficients of said polynomial with respect to the monomial basis, in which the $i^{th}$ entry of the vector is the coefficient of the $i^{th}$ polynomial returned by $r$.

For example, to represent

$$q(x)=2x_1^2+x_1x_2-3x_2+1$$

$r$ gives us

$$r(2,0)=\left(\begin{bmatrix}0\\ 0\end{bmatrix}\right),\quad r(2,1) =\left(\begin{bmatrix}1\\ 0\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix}\right),\quad r(2,2) =\left(\begin{bmatrix}2\\ 0\end{bmatrix},\begin{bmatrix}1\\ 1\end{bmatrix},\begin{bmatrix}0\\ 2\end{bmatrix}\right).$$

and thus I end up representing $q$ as

q.symbols = x
q.varnumber = 2
q.coefs = [1,0,-3,2,1]

However, I this seems to me as a poor way of doing things. In particular, I end up having to do a lot of searching through the results returned by $r$. For example, if I'd like to combine a polynomial $p$ in $x$ and another $q$ in $y$ (say add them together), I end up doing something along the lines as

Use $r$ to generate a basis of monomials of sufficiently high degree (in the case of adding of degree $\max\{\deg(p),\deg(q)\}$) in both $x\in\mathbb{R}^3$ and $y\in\mathbb{R}^2$.
Search and compare the appropriate components of the ($5$-dimensional) basis vectors for $p+q$ with the ($3$-dimensional) basis vectors of $p$ and separately the ($2$-dimensional) ones of $q$. Once I identify which basis vectors of $p+q$ correspond to which of $p$ and to which of $q$, then I know which coefficients to add together and where to store them. Actually, in this example no coefficients get added together because $p$ and $q$ are on different variables, but in general this is not the case…

Now a lot of these issues would be resolved if there was a bijection that was cheap to compute and invert from $\mathbb{N}$ and the set of monomials in several variables. For example, one in which we have an analytic formula for (however, I couldn't think of one, and I'm unsure there is one). I've also looked online and I've seen mentions of different possible ways to order polynomials (lexicographic, graded lexicographic, weighted, etc), with different properties.

However, it is unclear to me which one to use and why (I also know zilch about computational algebra). So my questions are:

Any suggestions on how to improve the above, reduce the computational cost mostly, but memory considerations would also be great (I'm open to completely changing the way I've been representing polynomials).
How is this usually done in computational algebra packages?
And/or any suggestions for references where I can find answers to these questions.

NB: I mostly care about manipulating arbitrary, real, polynomials on $\mathbb{R}^n$, and not those lying some module, ideal, etc. Thus setting up things so I can efficiently compute minimal generating sets of these, Grobner basis, etc, is probably not very important for me (I may be wrong, in which case please correct me). I'm also not too concerned about implementing algorithms for polynomial division and factorisation.

Edit: Thank you all for the replies. They've been useful in improving (a lot) the implementation. In case anyone's curious, a (still very rough, but functioning) version of the code can be found here.

Edit 2: In case anyone's interested, I'm currently storing the polynomials by storing their non-zero coefficients into an array together with their rank in the graded lexicographic order (the array sorted by the order). To multiply and add I go back and forth from the ranks to the monomials using some recursive function I cooked up.

Best Answer

Note: this answer turned out a bit TLDR; if you just want a direct suggestion (instead of a survey), skip to the last paragraph.

Since the typical use case has now been made clear (in the last comment), I suggest you should go over this 2007 presentation by M. Gastineau, especially slides 8-15. It summarizes the available representation options and also what's used by several CAS. You should then skim over the sequel of that presentation, especially slides 28-30, assuming multiplication is a common use case for you; even if that's not really the case, it is worth looking at this presentation too because it mentions what's used by several other CAS packages as representation that are somehow not mentioned in the first presentation.

Putting these together, there's a fair bit of info, e.g. GIAC and MAGMA use hashmaps, GINAC and Mathematica 5 use trees, Maple 10 uses DAGs, Maxima uses a recursive list, Singular uses monomial sorted lists (which work surprisingly well) etc. [I guess there's no implementation info on newer versions of commercial products.] I suspect some of the more obscure representations, like those used by TRIP (which is Gastineau's own CAS kernel) despite being fast, may be hard to code... Pari/GP appears fast too based on the opening slide of the first presentation, but it doesn't say what representation it uses. Looking at its FAQ it appears that multivariate polynomials are faked using univariate ones, so perhaps that's why the info wasn't there... and I'm guessing that that trick may not be satisfactory in your application. (SAGE which has been discussed by MvG in his answer also uses hashmaps [Python dicts].)

I've looked at Singular's sources a little bit because judging from its idea the implementation promised to be relatively simple. Alas the source code of Singular is rather messy and unevenly commented. One thing I could gather though is that they use bucket sort to keep their monomial lists in order.

Since the benchmarks in those slides are ~8 years old, I would not take them at face value for current version of the systems mentioned. The bottom line is that there are many representation options and you need to experiment to find out what works best for you. If you're implementing this in high-level language like Python, some of the low-level optimizations which rely on structure layout to achieve cache locality might not work unless you use something like NumPy arrays.

There's a more recent (2010) paper on the (new) implementation of "sparse" multivariate polynomials in Maple 14 by M. Monagan and R. Pearce. Interestingly it revives an old idea of packing multiple exponents into a word; besides that, it is just packed, lexicographically sorted monomial array, so the implementation is rather simple. And it claims to have beaten TRIP. The paper also benchmarks the other usual suspects but in newer versions (e.g. Mathematica 7) and on more operations than just multiplication. The reason why I put "sparse" in quotes is that the polynomials used in this paper aren't all that sparse; they are in fact similar to what was used in the 2007 TRIP presentation which doesn't call such polynomials sparse, e.g. this paper is using $(1+x+y+z)^{20}+1$ which has ~1700 terms. So I would suggest using this Maple 14 representation as it seems to be close (if not the) current "state of the art" as far as performance goes and it is reasonably simple to implement.