I would like to know how exactly the equations of motion in the Lorenz
gauge removes the second degree of freedom.
In the Lorenz 'gauge', we have
$$\Box A^{\mu} = \mu_0j^{\mu}$$
If $A^{\mu}$ is a solution, then so is $A^{\mu} + N\epsilon^{\mu}e^{-ik\cdot x}$ if
$$\Box (N\epsilon^{\mu}e^{-ik\cdot x}) = 0$$
Consistency with the Lorenz condition
$$\partial_{\mu}A^{\mu}=0$$
requires
$$k \cdot \epsilon = 0 $$
and consistency with the equation of motion requires that
$$k^2 = k \cdot k = 0$$
But this means that if a polarization four-vector $\epsilon$ satisfies the condition
$$k \cdot \epsilon = 0$$
then $\epsilon' = \epsilon + \alpha k$ also satisfies this condition
$$k \cdot \epsilon' = k \cdot (\epsilon + \alpha k) = k \cdot \epsilon + \alpha k^2 = 0$$
This means we can choose an $\epsilon^{\mu}$ such that $\epsilon^0 = 0$ and then the Lorenz condition implies that the wave and polarization 3-vectors are orthogonal
$$\vec k \cdot \vec\epsilon = 0$$
so there are only two independent polarization vectors (for freely propagating wave solutions).
To summarize, the Lorenz condition implies that the wave and polarization four-vectors are (Minkowski) orthogonal leaving three polarization degrees of freedom.
The equation of motion implies that the wave four-vector is null. Since null-vectors are self-orthogonal ($k^2 = 0$), we are left with two physical polarization degrees of freedom.
If you don't impose power-counting renormalizability, there are a host of other possibilities, since higher order derivatives or higher order interactions can be introduced. For example, terms $(Tr(F^2)^m)^n$ and are gauge invariant but for $m>1$ or $n>1$ not renormalizable.
If you impose power-counting renormalizability, uniqueness is fairly straightforward up to trivial field transformations. To see this one first looks at monomials - products of fields and their derivatives. By renormalizability, the total degree is not allowed to be bigger than 4. Each partial derivative $d_j=\partial_j$ counts as degree 1, each Bose fields $A_j$ as degree 1, and each Fermion field $\psi_j$ as degree 3/2. Moreover, Fermions must appear an even number of times to yield a scalar Lagrangian. This leads to a quite short list of possibilities: Up to 4 $A$s and $d$s, or $\psi\psi, d\psi\psi, A\psi\psi$, all with all possible indices. The general renormalizable local Lagrangian density is a linear combination of these, at fixed $x$. Now impose Poincare invariance and gauge invariance, and the only linear combinations left are the ones that one sees everywhere. For a single Yang-Mills field and nothing else (i.e., your question in the narrow sense)
the only freedom left is rescaling the fields, which eliminates an arbitrary factor in front of the trace. In the presence of fermionic fields there is the additional freedom to take linear combinations of Fermionic fields as new fields, which may be used to reduce the associated bilinear forms to weighted sums of squares.
If one drops gauge invariance, there are lots of other possible Langangian densities, for example a mass term, products of it with the terms described, and even more.
Note that proving the renormalizability of nonabelian gauge theories with broken symmetries was a highly nontrivial achievement (around hundred pages of published argument) worthy of a nobel prize for Veltman nd 't Hooft. Thus it is unreasonable to explain in an answer the reasons for where precisely the borderline is bewteen renormalizable and nonrenormalizable.
The answer to your question, ''Maybe I can put my question in simpler terms: is there any room for modifications in the Standard Model without introducing new fields? can we add new interactions among the gauge bosons (W,Z,…) and/or the matter fields without spoiling unitarity, covariance, or renormalisability? (at the perturbative level at least; here I dont care about θ terms, etc.)'' related to the bounty (which will disappear in a few hours)
is no, essentially by an extension of the above reasoning (including the 100 pages of proof of renormalizability).
Best Answer
$SU(N)$ has $N^2 -1$ gluons, one corresponding to each generator (this is called the adjoint representation). That "quantum number" (which could be called an index for the adjoint rep) could be tensored together with the Lorentz representation for a massless vector particle, which should have 2 polarizations (just like QED). So the theory will have $2(N^2 -1)$ physical degrees of freedom.
I don't understand why you say that the gauge bosons should be massive -- if they were, then the gauge symmetry would be "broken".