polca and mclust both performs Model-based cluster analysis, based on finite mixture models. However, polca is designed for Latent Class Analysis (LCA) which is the name for a particular class of mixture models suitable for categorical (polytomous) data. On the converse, mclust estimates Gaussian mixtures, so is suitable for quantitative variables.
You should choose between the two classes of models by analyzing the nature and structure of your variables.
Note that with LCA you are considering the variables as qualitative, that is, the information about the ordering of the modalities is ignored.
As regards to poLCA, you have too many unique values in each variable for the model to be identifiable. The number of independent parameters is related to the number of modalities (what you called unique values) of each variable and must be lower than the number of distinct configurations of the variables (in your case distinct observed 5-ples of outcomes among the units, which is $\leq 200$). In particular, if $m_a$, $m_b$, $m_c$ are the numbers of modalities for a 3-variables models with $k$ Latent Classes, then the number of independent parameters is:
$$
(k-1)+ k\cdot[(m_a-1)+(m_b-1)+(m_c-1)]
$$
So, yes: if you want to use LCA, you need to aggregate the modalities in order to reduce the number of parameters.
Btw, to run poLCA multiple times, you can simply use the nrep option.
The Welch-Satterthwaite d.f. can be shown to be a scaled weighted harmonic mean of the two degrees of freedom, with weights in proportion to the corresponding standard deviations.
The original expression reads:
$$\nu_{_W} = \frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{s_1^4}{n_1^2\nu_1}+\frac{s_2^4}{n_2^2\nu_2}}$$
Note that $r_i=s_i^2/n_i$ is the estimated variance of the $i^\text{th}$ sample mean or the square of the $i$-th standard error of the mean. Let $r=r_1/r_2$ (the ratio of the estimated variances of the sample means), so
\begin{align}
\nu_{_W} &= \frac{\left(r_1+r_2\right)^2}{\frac{r_1^2}{\nu_1}+\frac{r_2^2}{\nu_2}} \newline
\newline
&=\frac{\left(r_1+r_2\right)^2}{r_1^2+r_2^2}\frac{r_1^2+r_2^2}{\frac{r_1^2}{\nu_1}+\frac{r_2^2}{\nu_2}} \newline
\newline
&=\frac{\left(r+1\right)^2}{r^2+1}\frac{r_1^2+r_2^2}{\frac{r_1^2}{\nu_1}+\frac{r_2^2}{\nu_2}}
\end{align}
The first factor is $1+\text{sech}(\log(r))$, which increases from $1$ at $r=0$ to $2$ at $r=1$ and then decreases to $1$ at $r=\infty$; it's symmetric in $\log r$.
The second factor is a weighted harmonic mean:
$$H(\underline{x})=\frac{\sum_{i=1}^n w_i }{ \sum_{i=1}^n \frac{w_i}{x_i}}\,.$$
of the d.f., where $w_i=r_i^2$ are the relative weights to the two d.f.
Which is to say, when $r_1/r_2$ is very large, it converges to $\nu_1$. When $r_1/r_2$ is very close to $0$ it converges to $\nu_2$. When $r_1=r_2$ you get twice the harmonic mean of the d.f., and when $s_1^2=s_2^2$ you get the usual equal-variance t-test d.f., which is also the maximum possible value for $\nu_{_W}$ (given the sample sizes).
--
With an equal-variance t-test, if the assumptions hold, the square of the denominator is a constant times a chi-square random variate.
The square of the denominator of the Welch t-test isn't (a constant times) a chi-square; however, it's often not too bad an approximation. A relevant discussion can be found here.
A more textbook-style derivation can be found here.
Best Answer
Degrees of freedom are non-integer in a number of contexts. Indeed in a few circumstances you can establish that the degrees of freedom to fit the data for some particular models must be between some value $k$ and $k+1$.
We usually think of degrees of freedom as the number of free parameters, but there are situations where the parameters are not completely free and they can then be difficult to count. This can happen when smoothing / regularizing, for example.
The cases of locally weighted regression / kernel methods an smoothing splines are examples of such a situation -- a total number of free parameters is not something you can readily count by adding up predictors, so a more general idea of degrees of freedom is needed.
In Generalized Additive Models on which
gam
is partly based, Hastie and Tibshirani (1990) [1] (and indeed in numerous other references) for some models where we can write $\hat y = Ay$, the degrees of freedom is sometimes taken to be $\operatorname{tr}(A)$ (they also discuss $\operatorname{tr}(AA^T)$ or $\operatorname{tr}(2A-AA^T)$). The first is consistent with the more usual approach where both work (e.g. in regression, where in normal situations $\operatorname{tr}(A)$ will be the column dimension of $X$), but when $A$ is symmetric and idempotent, all three of those formulas are the same.[I don't have this reference handy to check enough of the details; an alternative by the same authors (plus Friedman) that's easy to get hold of is Elements of Statistical Learning [2]; see for example equation 5.16, which defines the effective degrees of freedom of a smoothing spline as $\operatorname{tr}(A)$ (in my notation)]
More generally still, Ye (1998) [3] defined generalized degrees of freedom as $\sum_i \frac{\partial \hat y_i}{\partial y_i}$, which is the sum of the sensitivities of fitted values to their corresponding observations. In turn, this is consistent with $\operatorname{tr}(A)$ where that definition works. To use Ye's definition you need only be able to compute $\hat y$ and to perturb the data by some small amount (in order to compute $\frac{\partial \hat y_i}{\partial y_i}$ numerically). This makes it very broadly applicable.
For models like those fitted by
gam
, those various measures are generally not integer.(I highly recommend reading these references' discussion on this issue, though the story can get rather more complicated in some situations. See, for example [4])
[1] Hastie, T. and Tibshirani, R. (1990),
Generalized Additive Models
London: Chapman and Hall.
[2] Hastie, T., Tibshirani, R. and Friedman, J. (2009),
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2ndEd
Springer-Verlag.
https://statweb.stanford.edu/~tibs/ElemStatLearn/
[3] Ye, J. (1998),
"On Measuring and Correcting the Effects of Data Mining and Model Selection"
Journal of the American Statistical Association, Vol. 93, No. 441, pp 120-131
[4] Janson, L., Fithian, W., and Hastie, T. (2013),
"Effective Degrees of Freedom: A Flawed Metaphor"
https://arxiv.org/abs/1312.7851