I am analysing data on symptoms, signs, and autopsy findings (a set of binary Y/N variables), viral serology, for several viruses, and a few other covariates (Age, gender, site) in pigs. After some work, a latent class analysis in poLCA produces a clinically sensible grouping of symptoms, in this case into three classes. This answers our first substantive question – do the symptoms group, and if so, how. The poLCA output is a set of probabilities of being in one of the 3 classes, given any particular value of one of the symptom variables. So far, so good.
The next step is to add covariates to this model.
The relevant poLCA output looks like this:
Estimated class population shares
0.4291 0.283 0.2879
Predicted class memberships (by modal posterior prob.)
0.4316 0.2895 0.2789
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fit for 3 latent classes:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2 / 1
Coefficient Std. error t value Pr(>|t|)
(Intercept) -0.03442 0.25292 -0.136 0.892
VirusCount -0.78575 0.38272 -2.053 0.046
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3 / 1
Coefficient Std. error t value Pr(>|t|)
(Intercept) -1.88277 0.50915 -3.698 0.001
VirusCount 1.92351 0.55724 3.452 0.001
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
number of observations: 190
number of estimated parameters: 22
residual degrees of freedom: 41
maximum log-likelihood: -572.9144
AIC(3): 1189.829
BIC(3): 1261.263
X^2(3): 71.56843 (Chi-square goodness of fit)
There is no intrinsic ordering in the three classes. I understand that 2/1 and 3/1 are logits of the relative odds of being in Class 2, or Class 3, compared with Class 1, for pigs with detectable virus (code 1 for detected, 0 for not detected). They are exactly, multinomial regression coefficients. What I am struggling with is how to calculate and report the substantively interpretable parameter estimates.
This paper reports relative risk ratios for covariates in a multinomial regression analysis done (I think) after doing LCA. I'm using poLCA because it permits simultaneous estimation of the LCA model, and the covariates. However the presentation in terms of the estimated risk of being in one class, or the other, compared with the reference class, makes sense to us.
My understanding, which may be wrong, is this – the parameters reported as the PCV2 coefficients are the log-odds of being in that class (class 2 or class 3) as opposed to class 1, given that the pig has detectable virus (VirusCount = 1
).
Hence the relevant Odds ratios (which is what my collaborators will understand) is $\exp(-0.78575)$ or about 0.455 and $\exp(1.92351)$ or about $6.8$. I don't understand the intercept terms, and any explanation of these would be welcomed.
There is a fairly long calculation given as an example in the poLCA manual, showing how to calculate the probability of being in the respective classes, which I don't really follow.
pidmat <- cbind(1,c(1:7))
exb <- exp(pidmat %*% nes.party$coeff)
matplot(c(1:7), (cbind(1,exb)/(1+rowSums(exb))),
main="Party ID as a predictor of candidate affinity class",
xlab="Party ID: strong Democratic (1) to strong Republican (7)",
ylab="Probability of latent class membership",
ylim=c(0,1), type="l", lwd=3,col=1)
text(5.9,0.35,"Other")
text(5.4, 0.7, "Bush affinity")
text(1.8, 0.6, "Gore affinity")
All advice, and clarifications, would be gratefully received.
Best Answer
I believe your interpretation of the coefficients as log odds is correct (and if you exponentiate them, you get relative risk ratios). The intercepts presented in your output are the multinomial intercepts.
If you don't know what those intercepts are, they estimate the fraction of people in each class. Backing up, for a 3-class latent class model, i.e. no regression component, the multinomial part of the model looks like this:
$$P(C = 1) = \frac{e^{\gamma_1}}{e^{\gamma_1} + e^{\gamma_2} + e^{\gamma_3}}$$ $$= \frac{1}{1 + e^{\gamma_2} + e^{\gamma_3}}$$ You have to set one class's intercept, i.e. the base class, to 1. More generally,
$$P(C = k) = \frac{exp(\gamma_k)}{\sum_{j=1}^K exp(\gamma_j)}$$ (where i indexes respondents)
Had you fit a latent class regression, it would look something like:
$$P(c_i = k|x_i) = \frac{exp(\gamma_{0k} + \gamma_{1k}x_i)}{\sum_{j=1}^K exp(\gamma_{0j} + \gamma_{1j}x_i)}$$
(in your example, x represents having a detectable viral load; in the poLCA example, it represents the strength of party identification)
So, the coefficients on in your output are the values of $\gamma_{1k}$ for each latent class (k = 2 and 3). The intercepts you referred to are just the multinomial intercepts, i.e. the $\gamma_{0k}$. Had you omitted viral load as a covariate, your model would have calculated those intercepts anyway, it's just that I believe
poLCA
doesn't display them. If you'd fit the model in Stata (my usual software), the header table would have included the multinomial intercepts, as I described on Statalist.Edit: Expanding my answer to include predicted probabilities
This second answer centers more around R syntax. The code block that @astaines quoted stems from an example in the
poLCA
manual, which I'm expanding here:This block fits a latent class regression to a dataset from the 2000 US Presidential election, using the strength of party identification (1-7 discrete variable, here treated as quasi-continuous) as the predictor. require(poLCA) data(election) f.party <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG, MORALB,CARESB,KNOWB,LEADB,DISHONB,INTELB)~PARTY nes.party <- poLCA(f.party,election,nclass=3,verbose=F)
I'm not an R expert, but I believe this is what the code is trying to do. If this were Stata, the
margins
command makes the sausage for you. In R, I suppose you have to make the sausage yourself.