I will follow the notes by A. Ferraro et al.
A state $\rho$ of a system with $n$ degrees of freedom is said to be Gaussian if its Wigner function can be written as
$$W[\rho](\boldsymbol{\alpha})
=
\frac{\exp\left( -\frac{1}{2}(\boldsymbol{\alpha} - \bar{\boldsymbol{\alpha}})^T
\boldsymbol{\sigma}^{-1}_\alpha
(\boldsymbol{\alpha} - \bar{\boldsymbol{\alpha}})
\right)
}{(2\pi)^n\sqrt{\text{Det}[\boldsymbol{\sigma}_\alpha ]}},$$
where $\boldsymbol{\alpha}$ and $\bar{\boldsymbol{\alpha}}$ are vectors containing all the $2n$ quadratures of the system and their average values, respectively, and $\boldsymbol{\alpha}$ is the covariance matrix, whose elements are defined as
$$[\boldsymbol{\sigma}]_{kl}
:= \frac{1}{2}
\langle \{R_k,R_l \} \rangle
-
\langle R_k\rangle \langle R_l\rangle,
$$
where $\{\cdot,\cdot\}$ is the anticommutator, and $R_k$ is the $k-th$ element of the vector $\boldsymbol{R}= (q_1,p_1,\ldots,q_n,p_n)^T$ with the $q$s and $p$s being the position and momentum-like operators.
A very important result is that it turns out that Gaussian states can be fully characterised by their covariace matrix plus the vector of expectation values of the quadratures, $\bar{\boldsymbol{\alpha}}$. If your system only has one mode (one boson), then you only need a symmetric $2\times2$ matrix and two real numbers ($q$ and $p$) to describe it! This means a total of five parameters.
As you point out, we can write any Gaussian state as
$$
\rho = D(\bar{\alpha})S(\xi)\rho_{th}S^\dagger(\xi)D^\dagger(\bar{\alpha})
$$
where here $\bar{\alpha}\equiv\frac{1}{\sqrt{2}}(\bar{x}+i\bar{p})$ and $\xi = r e^{i\varphi}$. If your thermal state has mean photon number $N$, then it suffices to know $N$, $r$ and $\varphi$ to compute the covariance matrix. Its elements are given by
$$
\sigma_{11} =\frac{2N+1}{2} \left(\cosh (2r)+\sinh (2r)\cos (\varphi)\right)
$$
$$\sigma_{22} =\frac{2N+1}{2}\left( \cosh(2r)-\sinh(2r)\cos(\varphi) \right)$$
$$\sigma_{12} =\sigma_{21}=-\frac{2N+1}{2}\sinh(2r)\sin(\varphi).$$
You can see that the main properties of the state are captured by the covariance matrix, because the displacement $\bar{\alpha}$ can always be disregarded by local operations (it is a phase-space translation). In other words, you can always put it to zero.
Answering your question, note that a Gaussian state is not simply a Gaussian distribution. You need more parameters than simply the variance and the mean (as you would to define a classical Gaussian probability distribution). These are in general five real values, but the essential ones are the ones entering the covariance matrix, as explained before.
As for the commutator, I am not aware of any closed formula. But I do know that displacing and then squeezing produces a state which has the same squeezing as the zqueezed-displaced state, but a different displacement.
I’m willing to bet your commutator $[\hat a^\dagger \hat a^\dagger,\hat a\hat a]$ is wrong.
This is because
\begin{align}
\hat K_+=\frac{1}{2} \hat a^\dagger \hat a^\dagger \, ,\qquad
\hat K_-=\frac{1}{2}\hat a\hat a\, ,\qquad
\hat K_0=\frac{1}{4}(\hat a^\dagger \hat a +\hat a\hat a^\dagger) \tag{1}
\end{align}
close under commutation on the algebra $\mathfrak{su}(1,1)$ so that $[\hat K_+,\hat K_-]$ is a multiple of $\hat K_0$, which is not proportional to $\hat n+1$ but rather to $2\hat n+1$.
A useful reference for the realization (1) and for this type of stuff is the book by Perelomov:
Perelomov A. Generalized Coherent States and Their Applications, Springer.
As a bonus the squeezing transformation is basically an $SU(1,1)$ transformation, with matrix elements given in explicit form in
Ui H. Clebsch-Gordan formulas of the SU (1, 1) group. Progress of Theoretical Physics. 1970 Sep 1;44(3):689-702.
Best Answer
Define operators $$ {\hat K}_- = {\hat a} {\hat b}\;,\;\;\; {\hat K}_+ = {\hat a}^\dagger {\hat b}^\dagger\;, \;\;\; {\hat K}_0 = \frac{1}{2} \Big({\hat a}{\hat a}^\dagger + {\hat b}{\hat b}^\dagger - {\hat I}\Big) $$ and notice that they define a $SU(1,1)$ algebra $$ [{\hat K}_-\;, \;{\hat K}_+] = 2 {\hat K}_0\;, \;\;\; [{\hat K}_0\;,\;{\hat K}_\pm] = \pm {\hat K}_\pm $$ Then the following decomposition holds: $$ e^{ \alpha_0 {\hat K}_0 + \alpha_+ {\hat K}_+ + \alpha_- {\hat K}_-}= e^{\gamma_+ {\hat K}_+} e^{\ln\gamma_0 {\hat K}_0} e^{\gamma_- {\hat K}_-} $$ where $$ \gamma_0 = \Big( \cosh \theta - \frac{\alpha_0}{2\theta} \sinh \theta \Big)^{-2}\;,\;\;\; \gamma_\pm = \frac{2\alpha_\pm \sinh\theta}{2\theta \cosh\theta - \alpha_0 \sinh\theta} $$ for $$ \theta^2 = \frac{\alpha_0^2}{4} - \alpha_+\alpha_- $$
Source: Klimov & Chumakov, "A Group-Theoretical Approach To Quantum Optics", Appendix 11.3.3.
For the case at hand we obviously have $\alpha_0=0$, $\alpha_- = - \alpha_+ = r/2$ and so $$ \theta = r/2\;,\;\;\; \gamma_0 = \Big( \cosh \frac{r}{2}\Big)^{-2}\;, \;\;\; \gamma_\pm = \mp\tanh\frac{r}{2} $$ hence $$ e^{(r/2)( {\hat a} {\hat b} - {\hat a}^\dagger {\hat b}^\dagger )} = e^{-\tanh(r/2) {\hat a}^\dagger {\hat b}^\dagger } e^{ - \ln\cosh (r/2) ({\hat a}{\hat a}^\dagger + {\hat b}{\hat b}^\dagger - {\hat I}) } e^{ \tanh(r/2) {\hat a} {\hat b}} $$ The rest is up to you.