Solved – LCA not returning the same results with the same data

latent-classr

I'm pretty new to Latent class analysis, and I obviously still have a lot to learn.

I have a data set with 10 000 observations and 20 variables. I'm trying plot the probabilities of each of my identified classes in the latent class analysis I did using the poLCA package in R (http://cran.r-project.org/web/packages/poLCA/poLCA.pdf).

What I'm trying to do is something like Figure 1 in Quek et all. 2013 (http://www.researchgate.net/publication/258441959_Concurrent_and_simultaneous_polydrug_use_latent_class_analysis_of_an_Australian_nationally_representative_sample_of_young_adults)

The problem is that every time I run the poLCA() function, my plot changes, so do the the probabilities of each class and of course this is because the results of the poLCA() changed.

Is this normal that doing two or more times the same thing with the same data I get different results ? I'm feeling very uncomfortable here, so if someone could explain me what's happening here, I could maybe live with myself again 🙂

Best Answer

Yes: it is perfectly normal. The algorithm used to find the ML estimation of a LCA can stop to a local maximum of the likelihood depending on the starting values. (This is a quite general problem in statistics) Unless you fix the starting values, poLCA automatically generates them at random each time you run it. The usual way to avoid this problem and find the global maximum likelihood solution is to try several starting values, that is, to repeat the function call several times and choose the one associated with the highest likelihood. This is simply done in poLCA via the "nrep" option.

Note that, in general, the higher the number of variables, and the number of modalities for variable in a LCA model, the higher the number of repetitions needed to be reasonably sure of the result.

Besides, I am assuming you do not have identifiability issues, that is, the number of independent parameters of your model is lower than the number of distinct configurations of the variables. If different estimations of the model are associated to the same likelihood value, the model you specified is not identifiable.