As Lubos Motl mentions in a comment, for all practical purposes, OP's sought-for eq. (1) is proved via Wick's Theorem.
It is interesting to try to generalize Wick's Theorem and to try to minimize the number of assumptions that goes into it. Here we will outline one possible approach.
I) Assume that a family $(\hat{A}_i)_{i\in I}$ of operators $\hat{A}_i\in{\cal A}$ lives in a (super) operator algebra ${\cal A}$
with (super) commutator $[\cdot,\cdot]$, and
with center $Z({\cal A})$.
Here
the index $i\in I$ runs over an index set $I$ (it could be continuous), and
the index $i$ contains information, such as, e.g., position $x$, time instant $t$, annihilation/creation label, type of field, etc., of the operator $\hat{A}_i$.
II) Assume that
$$
\forall i,j\in I~: \qquad [\hat{A}_i,\hat{A}_j] ~\in~Z({\cal A}).
$$
III) Assume that there are given two ordering prescriptions, say $T$ and $::$. Here $T$ and $::$ could in principle denote any two ordering prescriptions, e.g. time order, normal order, radial order, Weyl order$^1$, etc. This means that the index set $I$ is endowed with two strict total orders, say, $<$ and $\prec$, respectively, such that
The $T$ symbol is (graded) multilinear wrt. supernumbers.
$ T(\hat{A}_{\pi(i_1)}\ldots\hat{A}_{\pi(i_n)})~=~(-1)^{\sigma_{\pi}} T(\hat{A}_{i_1}\ldots\hat{A}_{i_n} )$ is (graded) symmetric, where $\pi\in S_n$ is a permutation of $n$ elements, and $(-1)^{\sigma_{\pi}}$ is a Koszul sign factor.$^2$
$ T(\hat{A}_{i_1}\ldots\hat{A}_{i_n} )~=~\hat{A}_{i_1}\ldots\hat{A}_{i_n}$ if $i_1 > \ldots > i_n$.
In the special case where some of the $ i_1 , \ldots , i_n$ are equal$^3$ (wrt. the order <), then one should symmetrize in appropriate (graded) sense over the corresponding subsets. For instance,
$$ T(\hat{A}_{i_1}\ldots\hat{A}_{i_n} )~=~\hat{A}_{i_1}\ldots\hat{A}_{i_{k-1}}\frac{\hat{A}_{i_k}\hat{A}_{i_{k+1}}+(-1)^{|\hat{A}{i_k}||\hat{A}{i_{k+1}}|}\hat{A}_{i_{k+1}}\hat{A}_{i_k}}{2}\hat{A}_{i_{k+2}}\ldots\hat{A}_{i_n}$$ if $i_1 > \ldots > i_k=i_{k+1}> \ldots > i_n$.
[Similar conditions 1-4 should hold for the second ordering $(::,\prec)$.]
IV) It then follows from assumptions I-III that the (generalized) contractions
$$
\hat{C}_{ij}~=~T(\hat{A}_i\hat{A}_j)~-~:\hat{A}_i\hat{A}_j:~\in~Z({\cal A})
$$
belong to the center $Z({\cal A})$. The contractions are graded symmetric
$$
\hat{C}_{ij}~=~(-1)^{|\hat{A}_i||\hat{A}_j|} \hat{C}_{ji}.
$$
V) Assume furthermore that the contractions $\hat{C}_{ij}$ do not depend on the operators $\hat{A}_k$, i.e.
$$ \frac{\partial \hat{C}_{ij}}{\partial \hat{A}_k}~=~0 $$
in order to simplify combinatoric arguments below.
VI) It is now a straightforward exercise to establish the corresponding Wick's Theorem
$$
T(f(\hat{A})) ~=~ \exp\left(\frac{1}{2}\sum_{i,j\in I}\hat{C}_{ij}\frac{\partial}{\partial\hat{A}_j}\frac{\partial}{\partial\hat{A}_i} \right):f(\hat{A}):,
$$
meaning a rule for how to re-express one ordering prescription $T(f(\hat{A}))$
[where $f$ is a sufficiently nice function of the $(\hat{A}_i)_{i\in I}$ family] in terms of the other ordering prescription $::$ and (multiple) contractions $\hat{C}_{ij}$. And vice-versa with the roles of the two orderings $T$ and $::$ interchanged:
$$
:f(\hat{A}): ~=~ \exp\left(-\frac{1}{2}\sum_{i,j\in I}\hat{C}_{ij}\frac{\partial}{\partial\hat{A}_j}\frac{\partial}{\partial\hat{A}_i} \right)T(f(\hat{A})).
$$
Such Wick's Theorems can now be applied successively to establish nested Wick's Theorems, such as, e.g.,$^4$
$$
T(:f(\hat{A})::g(\hat{A}):) ~=~ \left. \exp\left(\sum_{i,j\in I}\hat{C}_{ij}\frac{\partial}{\partial\hat{A}_j}\frac{\partial}{\partial\hat{B}_i} \right) :f(\hat{A}) g(\hat{B}): \right|_{\hat{B}=\hat{A}}.
$$
These Wick's Theorems may be extended to a larger class of operators than just the $(\hat{A}_i)_{i\in I}$ family through (graded) multilinearity.
VII) Let us now assume that the operators $\hat{A}_i$ are Bosonic for simplicity.
A particular consequence of a nested Wick's Theorem is the following version
$$T(:\hat{A}^2_i::\hat{A}^2_j:) ~=~ 2\hat{C}_{ij}^2 + 4 \hat{C}_{ij}:\hat{A}_i\hat{A}_j: + :\hat{A}^2_i\hat{A}^2_j:$$
of OP's sought-for eq. (1). Finally, let us mention that Wick's Theorem, radial order, OPE, etc., are also discussed in this and this Phys.SE posts.
--
Footnotes:
$^1$ Example: The Weyl/symmetric ordering satisfies
$$W(f(\hat{A})) ~=~\left. \exp\left(\sum_{i\in I}\hat{A}_i \frac{\partial}{\partial a_i} \right) f(a) \right|_{a=0}. $$
For more details, see e.g. my Phys.SE answer here.
$^2$ The Koszul sign convention produces a minus sign every time two Grassmann-odd objects are permuted. In this answer $|\hat{A}_i|=0,1 \pmod 2$ denotes the Grassmann-parity of $\hat{A}_i$.
$^3$ Being equal wrt. an order is in general an equivalence relation, and it is often a weaker condition than being equal as elements of $I$.
$^4$ A nested Wick's Theorem (between radial order and normal order) is briefly stated in eq. (2.2.10) on p. 39 in J. Polchinski, String Theory, Vol. 1. Beware that radial order is often only implicitly written in CFT texts. By the way, a side-effect/peculiarity of nested ordering symbols are discussed in this Phys.SE post.
Best Answer
You can always use Wick's theorem when you're describing expectation values with respect to free field states (that is, non-interacting, like a single Slater determinant state). For example, in Hubbard-Stratonovich or (generally) Variational Mean-Field Theory (like Bogliubov-deGennes) you take the interaction terms and decouple them into a model that is quadratic in fermion operators, but in a background of classical fields.
In a path-integral sense then, what I'm saying is that Wick's theorem is a statement about expectation values with respect to Gaussian distributions. Since we generally can't calculate in anything other than Gaussian distributions this is handy. Consider the action
$S=ax^2+bx^4$
and we want to integrate $e^{-S}$ over all $x$. Well, this is clearly the expectation value of $e^{-bx^4}=1-bxxxx+\mathcal{O}(b^2)$ with respect to a Gaussian distribution, and Wick's trick tells me that (since $x$ is just a scalar there's no antisymmetry...): $$\langle xxxx\rangle=\langle xx\rangle+\langle xx\rangle + \langle xx\rangle$$ so to any order in $b$ we can calculate the integral and we only have to know $\langle xx\rangle$ and do a little bit of combinatorics.
The generalization of this to operators follows basically the same logic and is surprisingly straightforward (... for path integrals! Standard books always present Wick's theorem in an operator formulation that presumably doesn't even make sense at finite temperatures and I hate it.)