So Yang-Mills theory is a non-abelian gauge theory, and we used a lot in QCD calculation.
But what are the distinctions between Yang-Mills theory and QCD?
And distinctions between supersymmetric Yang-Mill theories and SUSY QCD?
gauge-theoryquantum-chromodynamicsquantum-field-theorysupersymmetryyang-mills
So Yang-Mills theory is a non-abelian gauge theory, and we used a lot in QCD calculation.
But what are the distinctions between Yang-Mills theory and QCD?
And distinctions between supersymmetric Yang-Mill theories and SUSY QCD?
A. The action of $N=4$ SYM (Super Yang-Mills theory) in $d=4$ is the simple dimensional reduction of the 9+1-dimensional SYM, the maximal dimensional SYM that exists. The latter is $$S = \int d^{10} x\mbox{ Tr } \left( -\frac{1}{4} F_{\mu\nu}F^{\mu \nu} + \overline{\psi}D_\mu \gamma^\mu \psi \right)$$ where $D$ is the covariant derivative and $\psi$ is a real chiral spinor in 9+1 dimensions which has 16 real components, leading to 8 fermionic on-shell degrees of freedom. The dimensional reduction reduces $d^{10}x$ to $d^4 x$ but it also renames 6 "compactified" spatial components $A_\mu$ as six scalars $\Phi_I$ in $d=4$. The derivatives in the corresponding 6 directions are set to zero.
If one looks what fields and interactions we get in $d=4$ - it's straightforward to write the action - it's one gauge field; four Weyl fermions; six real scalars. All those fields transform as adjoint of the gauge group - the most popular ones are $SU(N)$. They have the standard kinetic terms; standard cubic couplings of the gauge field to all the other fields; the usual quartic Yang-Mills self-interaction of the gauge field; quartic interaction of the gauge field and the scalars; Yukawa cubic couplings for the 6 scalars that arise from the gauge interactions of the fermions in 9+1 dimensions; quartic potential for the scalars which is equal to the squared commutator. Everything has to be traced over the gauge group. All these interactions are related and determined by supersymmetry - by 16 real supercharges. The individual vertices of Feynman diagrams are self-evident but the true physics behind all of them is related by symmetries.
B. The simplest explanation of the S-duality is to represent the gauge theory as the low-energy limit of the dynamics of a stack of D3-branes in type IIB string theory. The S-duality group - actually it's an $SL(2,Z)$ group because it may also act on the $\theta$-angle (RR-axion) - is directly inherited from the same S-duality group of type IIB string theory. In particular, the $g\to 1/g$ may be interpreted in the F-theory description of type IIB as the exchange of the 11th and 12th (infinitesimal) dimension of F-theory. One may also get the gauge theory as the compactification of the $d=6$ (2,0) superconformal field theory on a tiny two-torus, and $SL(2,Z)$ acts in the obvious way - again, the exchange of the two radii is the $g\to 1/g$ map. It's a non-perturbative duality so there's no simple perturbative "field redefinition" that would prove it at the classical level. However, one may make many consistency checks that the duality seems viable - e.g. one may found the magnetic monopole solutions that turn to light elementary excitations at the strong coupling.
C. The $N=4$ $d=4$ vector multiplet contains all the physical fields in the theory and I have already written what they are: a vector field with 2 physical polarizations, 6 real scalars, and 8 fermionic degrees of freedom from 4 Weyl fermions, carrying the $SU(4) \approx SO(6)$ R-symmetry group. It's not too helpful to use superspace for $N=4$ theories, unless one wants to break it to $N=1$ or $N=2$. Too big supersymmetry.
I agree with Jeff that those things can be found in first (SUSY) chapters of any modern introductory textbook or other literature on advanced quantum field theory or string theory so in this sense, this question is a theft of the time of other users of this server.
By the way, I also want to mention that the $N=4$ theory is arguably the "most important" or "simplest" $d=4$ non-gravitational theory, by the modern criteria, and the action above is far from the only one - and maybe even from the most elementary - way to describe this theory. This theory, by the AdS/CFT correspondence, is also dual i.e. exactly equivalent to type IIB string theory on $AdS_5\times S^5$. The $N=4$ SYM also has the "dual superconformal symmetry" that, together with the original superconformal symmetry, generates an infinite-dimensional "Yangian symmetry". Twistor techniques are particularly useful for the computation of scattering amplitudes in this $N=4$ SYM and many of the twistor researchers think that the twistor formulae are more fundamental and elementary ways to describe physics of the SYM than the perturbative action above.
Very briefly, a classical theory is a gauge theory if its field variables $\varphi^i(\vec{x},t)$ have a non-trivial local gauge transformation that leaves the action $S[\varphi]$ gauge invariant. Usually, a gauge transformation is demanded to be a continuous transformation.
[Gauge theory is a huge subject, and I only have time to give some explanation here, and defer a more complete answer to, e.g., the book "Quantization of Gauge Systems" by M. Henneaux and C. Teitelboim. By the word local is meant that the gauge transformation in different space-time point are free to be transformed independently without affecting each others transformation (as opposed to a global transformation). By the word non-trivial is meant that the gauge transformation does not vanish identically on-shell. Note that an infinitesimal gauge transformation does not have to be on the form
$$\delta_{\varepsilon}A_{\mu}(\vec{x},t) = D_{\mu}\varepsilon(\vec{x},t),$$
nor does it have to involve a $A_{\mu}$ field. More generally, an infinitesimal gauge transformation is of the form
$$\delta_{\varepsilon}\varphi^i(x) = \int d^d y \ R^i{}_a (x,y)\varepsilon^a(y),$$
where $R^i{}_a (x,y)$ are Lagrangian gauge generators, which form a gauge algebra, which, in turn, may be open and reducible, and $\varepsilon^a$ are infinitesimal gauge parameters. Besides gauge transformations that are continuously connected to the identity transformation, there may be so-called large gauge transformations, which are not connected continuously to the identity transformation, and the action may not always be invariant under those. Ultimately, physicists want to quantize the classical gauge theories using, e.g., Batalin-Vilkovisky formalism, but let's leave quantization for a separate question. Various subtleties arise at the quantum level as, e.g., pointed out in the comments below. Moreover, some quantum theories do not have classical counterparts.]
Yang-Mills theory is just one example out of many of a gauge theory, although the most important one. To name a few other examples: Chern-Simons theory and BF theory are gauge theories. Gravity can be viewed as a gauge theory.
Yang-Mills theory without matter is called pure Yang-Mills theory.
Best Answer
From the beginning of the wikipedia page on Yang-Mills theory (have you read it?):
"Yang–Mills theory is a gauge theory based on the SU(N) group ...
... In early 1954, Chen Ning Yang and Robert Mills extended the concept of gauge theory for abelian groups, e.g. quantum electrodynamics, to nonabelian groups to provide ...
... This prompted a significant restart of Yang–Mills theory studies that proved successful in the formulation of both electroweak unification and quantum chromodynamics (QCD). The electroweak interaction is described by SU(2)xU(1) group while QCD is a SU(3) Yang-Mills theory."
Yang-Mills theoies are a class of (classical) field theories and might be viewed as a generalization of the electromagnetic field theory. What's different between the Yang-Mills theories is the respective gauge group under consideration, but the point is that there are several possible ones.
You can quantize the electromagnetic field theory and you "obtain" quantum electrodynamics. You can also quantize Yang Mills theories and this way you obtain some other specific quantum field theories. One "uses" Yang-Mills theory in the calculations of the different parts of the standard model etc. because the underlying structures are such non-abelian field theories. Notice that when physicists say "that's a Yang-Mills theory" they usually talk about the quantized version already.
For exmaple QCD is a (quantized) SU(3)-Yang-Mills theory with coupling to certain ferimons. The fermions in the Lagrangian are coupled to the bosons via the current term "$j^\mu A_\mu$". The specific (Lie-)group structure (SU(3) in the QCD case) is in particular refleced in the number of gluons (eight) and so on. Like many other physical features, this is determined by group representation theory.
Supersymmetric theories are theories with more features than the usual Yang-Mills theory, which a priori is mostly about the bosonic fields (Photons, W$^{\pm}$/Z-bosons, gluons,...). Supersymmetry relates fermions and bosons.