LDA is a probabilistic graphical model for a document generating process as explained in Blei et. al in JMLR 2003(For more intuition on generative process see http://videolectures.net/mlss09uk_blei_tm/). Now the main and obvious idea here is that we can generate documents if we knew the parameters, in a sense in which we have modeled our document. But the problem here is that we do not know the parameters. So we use our Bayes rule to invert the generative process. Now we want to model the uncertainty in the parameters given the data, in a sense in which we have our model. That is we need to infer the unknowns given the data.
In both cases, the model is the same. But the way in which we are inferring is different. The method used by Blei et al is called variational inference and the one used by Griffiths is sampling based inference. Both are approximate inference methods, one being in MCMC class(Griffiths) and other being the variational class(Blei).
In variational inference, say we have a complex multimodal distribution on which we cannot infer. What we want is to approximate that complex multimodal distribution with a simpler distribution(called Q in the literature see (3)). We do this by choosing a simpler family of distributions, either by explicitly choosing a simpler family in parametric form as in LDA by Blei 2003 or just deciding the factorized form as in (5) by Bishop for a normal distribution example(Here without explicitly choosing the parametric form, this is why it is also called free form optimization).
Other important difference from gibbs sampling is that the simpler distribution(1) locks on to one of the modes(2) of the complex distribution which we could not handle but in Gibbs sampling we visit the modes all over.
With respect to variational inference, it is easier to look at gibbs sampling. In Gibbs sampling, we form an integral of the statistics we are interested in(probabilities can also be expressed as integrals). Now we use monte carlo approximation to the integral we formed by using samples from the distribution. Again here also it is difficult to sample from the complex distribution(if applicable) and we turn to a familiar distribution from which we are capable of sampling. There are a lot of hacks and improvement to this basic description.
For more details see (3) (4)
(2)(which comes from the zero forcing nature of reverse KL divergence used as cost. Understanding zero forcing is subtle and nice. Suppose you want to lock the unnormalized unimodal distribution onto one of the modes of complex multimodal distribution. Now we need some imagination. What happens in zero forcing is that where the complex distribution is zero the simpler multimodal distribution is forced to be zero and because the simpler distribution's is mostly chosen to be unimodal so it has no choice but to slip into one of the modes(zero focing effect). If you think in terms of unnormalized distribution because we are interested in parameters, it seems cool that the unimodal slips into one of the modes of multimodal).
(1)(which is unimodal in the case of mean field approximation)
(3)
Machine Learning a Probabilistic Perspective: Kevin Murphy
chapter 21 Variational Inference
chapter 22 More Variational Inference
chapter 23 Monte Carlo Inference
chapter 24 Markov Chain Monte Carlo Inference
(4)
Graphical Models, Exponential Families, and Variational Inference: Martin Wainwright and Michael Jordan
(5) Pattern Recognition and Machine Learning : Bishop
Latent Class Analysis is in fact an Finite Mixture Model (see here). The main difference between FMM and other clustering algorithms is that FMM's offer you a "model-based clustering" approach that derives clusters using a probabilistic model that describes distribution of your data. So instead of finding clusters with some arbitrary chosen distance measure, you use a model that describes distribution of your data and based on this model you assess probabilities that certain cases are members of certain latent classes. So you could say that it is a top-down approach (you start with describing distribution of your data) while other clustering algorithms are rather bottom-up approaches (you find similarities between cases).
Because you use a statistical model for your data model selection and assessing goodness of fit are possible - contrary to clustering. Also, if you assume that there is some process or "latent structure" that underlies structure of your data then FMM's seem to be a appropriate choice since they enable you to model the latent structure behind your data (rather then just looking for similarities).
Other difference is that FMM's are more flexible than clustering. Clustering algorithms just do clustering, while there are FMM- and LCA-based models that
- enable you to do confirmatory, between-groups analysis,
- combine Item Response Theory (and other) models with LCA,
- include covariates to predict individuals' latent class membership,
- and/or even within-cluster regression models in latent-class regression,
- enable you to model changes over time in structure of your data etc.
For more examples see:
Hagenaars J.A. & McCutcheon, A.L. (2009). Applied Latent Class
Analysis. Cambridge University Press.
and the documentation of flexmix and poLCA packages in R, including the following papers:
Linzer, D. A., & Lewis, J. B. (2011). poLCA: An R package for
polytomous variable latent class analysis. Journal of Statistical
Software, 42(10), 1-29.
Leisch, F. (2004). Flexmix: A general framework for finite mixture
models and latent glass regression in R. Journal of Statistical
Software, 11(8), 1-18.
GrĂ¼n, B., & Leisch, F. (2008). FlexMix version 2: finite mixtures with
concomitant variables and varying and constant parameters. Journal of
Statistical Software, 28(4), 1-35.
Best Answer
See Tueller (2010), Tueller and Lubke (2010), and [Ruscio et al.'l book][3] for complete detail on what is summarized below. Taxometric procedures generally work by computing simple statistics on subset of sorted data. MAMBAC uses the mean, MAXCOV uses the covariance, and MAXEIG using the eigen value. Latent class analysis is a special case of the general latent variable mixture model (LVMM). The LVMM specifies a model for the data which may include latent classes, latent factors, or both. Parameters of the model are obtained using maximum likelihood or Bayesian estimates. Refer to the literature above for complete detail.
What is more important that the mathematical underpinnings (which are beyond the scope of this forum) are the hypotheses that can be tested under each approach. Taxometric procedures test the hypothesis
H1: Two classes explain all (or most) of the observed correlation among a set of indicators H0: One (or more) continuous underlying dimension(s) explain all of the observed correlation among a set of indicators
Usually the CCFI is used to ascertain which hypothesis to reject/retain. See [John Ruscio's book on the topic][4]. Taxometric procedures can test only these two hypothesis and no others.
Used alone, latent class analysis cannot test the taxometric alternative hypothesis, H0 above. However, latent class analysis can test the following alternative hypotheses:
H1a: Two classes explain all of the observed correlation among a set of indicators H1b: Three classes explain all of the observed correlation among a set of indicators ... H1k: k classes explain all of the observed correlation among a set of indicators
To test H0 from above in a latent variable framework, fit a single factor confirmatory factor analysis (CFA) model to the data (call this H0cfa which is different from H0 - H0 only tests a hypothesis of fit under the taxometric framework, but doesn't produce parameter estimates as you would get by fitting a CFA model). To compare H0cfa to H1a, H1b, ..., H1k, use the Bayesian Information Criterion (BIC) ala [Nylund et al. (2007)][5].
To summarize thus far, taxometric procedures can look at two vs. one class solutions, while latent class + CFA can test one vs. two or more class solutions. We see that taxometric procedures test a subset of the hypotheses tested by latent class + CFA model comparisons.
All of the hypotheses present thus far are extremes at two ends of a spectrum. The more general hypothesis is that some number of latent classes and some number of latent dimensions (or latent factors) best explain the data. The approaches described above reject this outright, which is a very strong assumption. Put differently, a latent class model and a taxometric procedure that leads to a conclusion of taxonic structure (rather than dimensional) assume within class individual differences besides random error. In your context, this is equivalent to say that within the chronic pain class, there is no systematic variation in the tendency to develop chronic pain, only random chance.
The weakness of this assumption is better illustrated with an example from psychopathology. Say you have a set of indicators for depression, and your taxometric and/or latent class models lead you to conclude there is a depressed class and a non-depressed class. These models implicitly assume no variance in severity of depression within class (beyond random error or noise). In other words, you are depressed, or you are not, and among the depressed everyone is equally depressed (beyond variation in error prone observed variables). So we only need one treatment for depression at one dose level! It is easily seen that this assumption is absurd for depression, and is often just as limited for most other research contexts.
To avoid making this assumption, use a factor mixture modeling approach following the papers of [Lubke and Muthen and Lubke and Neale][6].