Note that, for example,
\begin{align}
[A_\mu,\partial_\nu]f&=A_\mu\partial_\nu f-\partial_\nu(A_\mu f)\\
&=A_\mu\partial_\nu f-\partial_\nu(A_\mu)f-A_\mu\partial_\nu f\\
&=-f\partial_\nu A_\mu\,.
\end{align}
So you don't get terms like $A_\mu\partial_\nu$.
If you don't impose power-counting renormalizability, there are a host of other possibilities, since higher order derivatives or higher order interactions can be introduced. For example, terms $(Tr(F^2)^m)^n$ and are gauge invariant but for $m>1$ or $n>1$ not renormalizable.
If you impose power-counting renormalizability, uniqueness is fairly straightforward up to trivial field transformations. To see this one first looks at monomials - products of fields and their derivatives. By renormalizability, the total degree is not allowed to be bigger than 4. Each partial derivative $d_j=\partial_j$ counts as degree 1, each Bose fields $A_j$ as degree 1, and each Fermion field $\psi_j$ as degree 3/2. Moreover, Fermions must appear an even number of times to yield a scalar Lagrangian. This leads to a quite short list of possibilities: Up to 4 $A$s and $d$s, or $\psi\psi, d\psi\psi, A\psi\psi$, all with all possible indices. The general renormalizable local Lagrangian density is a linear combination of these, at fixed $x$. Now impose Poincare invariance and gauge invariance, and the only linear combinations left are the ones that one sees everywhere. For a single Yang-Mills field and nothing else (i.e., your question in the narrow sense)
the only freedom left is rescaling the fields, which eliminates an arbitrary factor in front of the trace. In the presence of fermionic fields there is the additional freedom to take linear combinations of Fermionic fields as new fields, which may be used to reduce the associated bilinear forms to weighted sums of squares.
If one drops gauge invariance, there are lots of other possible Langangian densities, for example a mass term, products of it with the terms described, and even more.
Note that proving the renormalizability of nonabelian gauge theories with broken symmetries was a highly nontrivial achievement (around hundred pages of published argument) worthy of a nobel prize for Veltman nd 't Hooft. Thus it is unreasonable to explain in an answer the reasons for where precisely the borderline is bewteen renormalizable and nonrenormalizable.
The answer to your question, ''Maybe I can put my question in simpler terms: is there any room for modifications in the Standard Model without introducing new fields? can we add new interactions among the gauge bosons (W,Z,…) and/or the matter fields without spoiling unitarity, covariance, or renormalisability? (at the perturbative level at least; here I dont care about θ terms, etc.)'' related to the bounty (which will disappear in a few hours)
is no, essentially by an extension of the above reasoning (including the 100 pages of proof of renormalizability).
Best Answer
Classically, the gauge field strength is a curvature of a connection, the same way that the Riemann tensor is. Since $F^a_{\mu\nu}$ lives in the adjoint representation of the gauge group, you can define a 4-index object very analogous to the Riemann tensor:
$$\mathcal{F}^a{}_{b\mu\nu} \equiv F^c_{\mu\nu} f_c{}^a{}_b,$$
where $f_c{}^a{}_b$ are the structure constants defined via
$$[t_c, t_b] = f_c{}^a{}_b \, t_a,$$
modulo factors of $i$ if you care to insert them. In any case, the object $\mathcal{F}^a{}_{b\mu\nu}$ contains information about parallel transport around infinitesimal loops, in the same sense that the Riemann tensor does. But the vector being translated is not a tangent vector; instead it is a vector in gauge space (or "internal" space), whose components are defined via expansion in the generators $t_a$, as in $V = V^a \, t_a$.
So, $\mathcal{F}^a{}_{b\mu\nu}$ describes the change in $V = V^a \, t_a$ as it is parallel-transported (via the covariant derivative $D_\mu \equiv \partial_\mu + A_\mu$, again modulo factors of $i$, etc.) around a small parallelogram in the directions $\partial_\mu, \partial_\nu$.