Solved – Conducting a three-step Latent Class Analysis in R

analysislatent-classlatent-variabler

I'm trying to conduct a Latent Class Analysis in R using the poLCA package, and I have now become stuck on two aspects of the process.

I have conducted Latent Class Analysis separately for males and females, as it looks as though the variables behave differently in each. However, I am aware journals tend to want a significance test to assert that this separation was necessary – does anyone have any idea how to code this with poLCA objects?
I am trying to use the three-step method to assign people to classes, based on that described by Asparouhov and Muthen (https://www.tandfonline.com/doi/pdf/10.1080/10705511.2014.915181?needAccess=true). Yet, again, I can't work out how to do it. I thought this reddit thread might help (https://www.reddit.com/r/statistics/comments/2dh5h5/hey_all_i_need_help_converting_between_logits_and/), but I still don't understand (a) how to convert between the tables described, nor (b) how this data would then be used to assign each individual to a class.

Sorry that my descriptions of the issues are slightly vague; can anyone shed any light on either of them?

Best Answer

I can only comment on #1 for now. You are talking about fitting a multiple-group latent class model (link goes to UCLA website with a worked example in MPlus. This is a bit like differential item function in item response theory. In the latent class case, you would fit a multiple group model, then use Wald tests for the parameter estimates. You can obviously do this in MPlus, and I typed up the Stata syntax here.

Unfortunately, it appears that poLCA can't fit a multiple-group LCA. You can obviously fit the models separately, but I don't see a way to test if the parameters differ. I am not that conversant on the capabilities of other R packages, so I can't advise.

On #2, as I understand the article, you're talking about a method to estimate the relationship between latent classes and a distal (unrelated) outcome. We recently had a discussion about that and there is likely to be a simpler method than what Muthén and Asparaouhov proposed above, although I believe they critique the paper I was referencing in the link.

I've read the Muthén/Asparouhov article you linked, and right now, it is not making sense to me (not because I think they're wrong, but because I can't understand it). I may alter this answer if I read it and finally get it.

Related Solutions

Latent Class Analysis vs Cluster Analysis – Key Differences and Inferences

Latent Class Analysis is in fact an Finite Mixture Model (see here). The main difference between FMM and other clustering algorithms is that FMM's offer you a "model-based clustering" approach that derives clusters using a probabilistic model that describes distribution of your data. So instead of finding clusters with some arbitrary chosen distance measure, you use a model that describes distribution of your data and based on this model you assess probabilities that certain cases are members of certain latent classes. So you could say that it is a top-down approach (you start with describing distribution of your data) while other clustering algorithms are rather bottom-up approaches (you find similarities between cases).

Because you use a statistical model for your data model selection and assessing goodness of fit are possible - contrary to clustering. Also, if you assume that there is some process or "latent structure" that underlies structure of your data then FMM's seem to be a appropriate choice since they enable you to model the latent structure behind your data (rather then just looking for similarities).

Other difference is that FMM's are more flexible than clustering. Clustering algorithms just do clustering, while there are FMM- and LCA-based models that

enable you to do confirmatory, between-groups analysis,
combine Item Response Theory (and other) models with LCA,
include covariates to predict individuals' latent class membership,
and/or even within-cluster regression models in latent-class regression,
enable you to model changes over time in structure of your data etc.

For more examples see:

Hagenaars J.A. & McCutcheon, A.L. (2009). Applied Latent Class Analysis. Cambridge University Press.

and the documentation of flexmix and poLCA packages in R, including the following papers:

Linzer, D. A., & Lewis, J. B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software, 42(10), 1-29.

Leisch, F. (2004). Flexmix: A general framework for finite mixture models and latent glass regression in R. Journal of Statistical Software, 11(8), 1-18.

Grün, B., & Leisch, F. (2008). FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. Journal of Statistical Software, 28(4), 1-35.

Solved – Choice based conjoint latent class analysis in R

flexmix would do the job but (so far I remember) only if you model binary (Yes/No) or pairwise (A vs B) choices (Last time I checked the authors were working on an extension to multinomial (MNL) choices)

However, latent class logit (LCL) models are relatively easy to code as they consist in a discrete mixture of standard MNL models (so if you know how to code an MNL model you should be able to write your own LCL code).

Here is an example for a LCL with 2 classes:

X -> Matrix of independent variables (e.g., attributes' levels)
Y -> Column vector of observed choices (0/1)
N -> Column vector of respondents ID (e.g., 1 1 1 1 2 2 2 2 3 3 3 3 ...)
G -> Column vector of observations ID (e.g., 1 1 2 2 3 3 4 4 5 5 6 6 ...)

In this the code, the model specification is quite simple:
- Only 2 latent classes.
- Same set of predictors for the 2 classes (Possible to add some constraints).
- Constant only for class membership (Possible to add some covariates (age, gender, etc)).

loglik.LCL = function(beta, X, Y, N, G){
### Class 1
num1 = exp(as.matrix(X) %*% as.vector(beta[1:ncol(X)]))
den1 = tapply(num1, G, sum)
prb1 = num1[Y==1] / den1
sprb1 = tapply(prb1, N, prod)
### Class 2
num2 = exp(as.matrix(X) %*% as.vector(beta[1+ncol(X):2*ncol(X)]))
den2 = tapply(num2, G, sum)
prb2 = num2[Y==1] / den2
sprb2 = tapply(prb2, N, prod)
### Membership
cla1 = exp(0)
cla2 = exp(beta[1+2*ncol(X)])
CLA = cla1 + cla2
### Log-likelihood
llik = -sum(log(cla1/CLA * sprb1 + cla2/CLA * sprb2))
return(llik)}

Remark: Possible to write more efficient version of this code if you have a complete dataset by replacing tapply() by matrix operations (reshape, colSums, etc).

You can compare your results with the "lclogit" Stata command.

Best Answer

Related Solutions

Latent Class Analysis vs Cluster Analysis – Key Differences and Inferences

Solved – Choice based conjoint latent class analysis in R

Related Question