Latent Class Analysis – Difference Between Latent Class Analysis and Mixture Models

gaussian mixture distributionlatent-classlatent-variable

I have been trying to look into latent class analysis and don't exactly understand what it is. Is it basically the expectation maximization using and analyzing the classes formed? The resources on the internet have been pretty confusing and I am not able to get a clear view of what latent class analysis really is.

Best Answer

Latent class analysis (LCA) is a discrete finite mixture model. Finite mixture model is a model-based clustering algorithm, that treats the distribution of the data $f$ as a mixture of $k$ distributions $f_k$, each appearing with mixing proportion $\pi_k$,

$$ f(x, \vartheta) = \sum^K_{k=1} \pi_k \, f_k(x, \vartheta_k) $$

where the class assignments (clusters) are unknown and learned from the data. In case of LCA, the variables are discrete, so the aim is to cluster the discrete data into $K$ latent classes, each characterized by different conditional probability distribution. For two discrete variables $A$ and $B$, and latent variable for the class assignment $X$, the distribution may be defined as

$$ P(A=i, B=j) = \sum_{k=1}^K \, \overbrace{P(X=k)}^{\pi_k} \, \overbrace{P(A=i, B=j|X=k)}^{f_k} $$

where, to simplify the computations, it is often assumed that the variables are independent $P(A=i, B=j|X=k) = P(A=i|X=k)\,P(B=j|X=k)$. What may be confusing, is that the LCA literature quite commonly uses rather peculiar notation, where:

$$ P(A=i, B=j) = \sum_{k=1}^K \, P(X=k) \, P(A=i|X=k)\, P(B=j|X=k) $$

can be written as something like below, or variants of it:

$$ \pi_{ij} = \sum_{k=1}^K \, \pi^X_k \, \pi^{\bar A X}_{ki} \, \pi^{\bar B X}_{kj} $$

For learning more, there is nice introduction with examples in the documentation of poLCA R package (Linzer and Lewis, 2011), and brief tutorial by Vermunt and Magidson (2003). There is a big variety of latent class analysis models, you can find extended review in Hagenaars and McCutcheo (2009).

Hagenaars J.A. and McCutcheon, A.L. (2009). Applied Latent Class Analysis. Cambridge University Press.

Vermunt, J.K., and Magidson, J. (2003). Latent class models for classification. Computational Statistics & Data Analysis, 41(3), 531-537.

Linzer, D. A., and Lewis, J. B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of statistical software, 42(10), 1-29.