For the case of Gaussian Mixture Models, you can easily extend it and get a Dirichlet Process - GMM (DP-GMM or Infinite GMM). It was first proposed in this Rasmussen's paper. I like this paper very much because we introduce the concept very nicely.
The idea is going non-parametric and let the data decide the number of clusters it is more confortable with.
Imagine you decide that you have $K$ components. If $z_n$ is the cluster assignment of point $n$, you can put a prior over the assignments: $\boldsymbol{\pi} = [p_1, p_2...,p_k]$. If you put yet another prior over $\mathbf{\pi}$ you can avoid specifying them. We use a Multinomial prior and a Dirichlet prior. You would have:
$$
p(z_n | \boldsymbol{\pi}) \sim \text{Multinomial}(\mathbf{\pi})\\
p(\boldsymbol{\pi} |\alpha) \sim \text{Dirichlet}(\alpha)
$$
On the other hand, you put also a prior on the latent parameters of a component $i$, $\mathbf{\theta_i}$.
$$
p(\boldsymbol{\theta_i}) \sim G_0\
$$
And you also have the likelihoods to say what is the probability of the observed data point $x_n$ given it belongs to component $i$:
\begin{align}
p(x_n | \theta_i)
\end{align}
Up to here, this is a GMM with a fixed number of components or clusters.
Now, the trick is to assume that $K$ goes to infinity [1]. If you do so, you end up with a Dirichlet Process prior and you will be able to infer the number of clusters, the parameters of every cluster and the clusters assignments.
The idea of the Dirichlet Process as a base for non-parametrical models is quite popular.
I think I've seen something applied to HMM, but I'm not sure about that.
[1] I wrote the maths explicitly here, but it's only a draft, I would like to add some images and examples.
PS: See also this popular post: http://blog.echen.me/2012/03/20/infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-process/
Best Answer
I like the article entitled Observations on the Use of Growth Mixture Models in Psychological Research. Perhaps not as theoretical as you would like but it is very enlightening. It is written in the context of longitudinal research I think, and within the psychological realm, but a lot can be learned from it.
Edit: Actually, upon second reading there is quite a bit of theoretical/philosophical discussion in that paper! Seems relevant.
Edit: I would also like to add another paper to this answer, entitled What's a taxon? Meehl argues that there are true clusters in nature and provides a salient example there there are gophers, and there are chipmunks, but there are no gophmunks. This does a good job at defining a taxonic group, and highlights that such taxons may also be common in humans. A great deal of research has sought to answer such questions using latent cluster analyses and such.