Solved – Interpretation of regression coefficients in latent class regression (using poLCA in R)

interpretationlatent-classmultinomial-distributionrregression coefficients

I am analysing data on symptoms, signs, and autopsy findings (a set of binary Y/N variables), viral serology, for several viruses, and a few other covariates (Age, gender, site) in pigs. After some work, a latent class analysis in poLCA produces a clinically sensible grouping of symptoms, in this case into three classes. This answers our first substantive question – do the symptoms group, and if so, how. The poLCA output is a set of probabilities of being in one of the 3 classes, given any particular value of one of the symptom variables. So far, so good.

The next step is to add covariates to this model.

The relevant poLCA output looks like this:

Estimated class population shares 
 0.4291 0.283 0.2879  
Predicted class memberships (by modal posterior prob.) 
 0.4316 0.2895 0.2789 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fit for 3 latent classes: 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2 / 1 
        Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)    -0.03442     0.25292   -0.136     0.892
VirusCount     -0.78575     0.38272   -2.053     0.046
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3 / 1 
        Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)    -1.88277     0.50915   -3.698     0.001
VirusCount      1.92351     0.55724    3.452     0.001
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
number of observations: 190 
number of estimated parameters: 22 
residual degrees of freedom: 41 
maximum log-likelihood: -572.9144 
AIC(3): 1189.829
BIC(3): 1261.263
X^2(3): 71.56843 (Chi-square goodness of fit) 

There is no intrinsic ordering in the three classes. I understand that 2/1 and 3/1 are logits of the relative odds of being in Class 2, or Class 3, compared with Class 1, for pigs with detectable virus (code 1 for detected, 0 for not detected). They are exactly, multinomial regression coefficients. What I am struggling with is how to calculate and report the substantively interpretable parameter estimates.

This paper reports relative risk ratios for covariates in a multinomial regression analysis done (I think) after doing LCA. I'm using poLCA because it permits simultaneous estimation of the LCA model, and the covariates. However the presentation in terms of the estimated risk of being in one class, or the other, compared with the reference class, makes sense to us.

My understanding, which may be wrong, is this – the parameters reported as the PCV2 coefficients are the log-odds of being in that class (class 2 or class 3) as opposed to class 1, given that the pig has detectable virus (VirusCount = 1).

Hence the relevant Odds ratios (which is what my collaborators will understand) is $\exp(-0.78575)$ or about 0.455 and $\exp(1.92351)$ or about $6.8$. I don't understand the intercept terms, and any explanation of these would be welcomed.

There is a fairly long calculation given as an example in the poLCA manual, showing how to calculate the probability of being in the respective classes, which I don't really follow.

pidmat <- cbind(1,c(1:7))
exb    <- exp(pidmat %*% nes.party$coeff)
matplot(c(1:7), (cbind(1,exb)/(1+rowSums(exb))),
  main="Party ID as a predictor of candidate affinity class",
  xlab="Party ID: strong Democratic (1) to strong Republican (7)",
  ylab="Probability of latent class membership",
  ylim=c(0,1), type="l", lwd=3,col=1)
text(5.9,0.35,"Other")
text(5.4, 0.7, "Bush affinity")
text(1.8, 0.6, "Gore affinity")

All advice, and clarifications, would be gratefully received.

Best Answer

My understanding, which may be wrong, is this - the parameters reported as the PCV2 coefficients are the log-odds of being in that class (class 2 or class 3) as opposed to class 1, given that the pig has detectable virus (VirusCount = 1).

Hence the relevant Odds ratios (which is what my collaborators will understand) is 𝑒π‘₯𝑝(βˆ’0.78575) or about 0.455 and 𝑒π‘₯𝑝(1.92351)

or about 6.8. I don't understand the intercept terms, and any explanation of these would be welcomed.

I believe your interpretation of the coefficients as log odds is correct (and if you exponentiate them, you get relative risk ratios). The intercepts presented in your output are the multinomial intercepts.

If you don't know what those intercepts are, they estimate the fraction of people in each class. Backing up, for a 3-class latent class model, i.e. no regression component, the multinomial part of the model looks like this:

$$P(C = 1) = \frac{e^{\gamma_1}}{e^{\gamma_1} + e^{\gamma_2} + e^{\gamma_3}}$$ $$= \frac{1}{1 + e^{\gamma_2} + e^{\gamma_3}}$$ You have to set one class's intercept, i.e. the base class, to 1. More generally,

$$P(C = k) = \frac{exp(\gamma_k)}{\sum_{j=1}^K exp(\gamma_j)}$$ (where i indexes respondents)

Had you fit a latent class regression, it would look something like:

$$P(c_i = k|x_i) = \frac{exp(\gamma_{0k} + \gamma_{1k}x_i)}{\sum_{j=1}^K exp(\gamma_{0j} + \gamma_{1j}x_i)}$$

(in your example, x represents having a detectable viral load; in the poLCA example, it represents the strength of party identification)

So, the coefficients on in your output are the values of $\gamma_{1k}$ for each latent class (k = 2 and 3). The intercepts you referred to are just the multinomial intercepts, i.e. the $\gamma_{0k}$. Had you omitted viral load as a covariate, your model would have calculated those intercepts anyway, it's just that I believe poLCA doesn't display them. If you'd fit the model in Stata (my usual software), the header table would have included the multinomial intercepts, as I described on Statalist.


Edit: Expanding my answer to include predicted probabilities

This second answer centers more around R syntax. The code block that @astaines quoted stems from an example in the poLCA manual, which I'm expanding here:

This block fits a latent class regression to a dataset from the 2000 US Presidential election, using the strength of party identification (1-7 discrete variable, here treated as quasi-continuous) as the predictor. require(poLCA) data(election) f.party <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG, MORALB,CARESB,KNOWB,LEADB,DISHONB,INTELB)~PARTY nes.party <- poLCA(f.party,election,nclass=3,verbose=F)

#next line displays model coefficients
nes.party$coeff

#create a matrix with (what I believe is) the linear predictors in the LCR equation, 
#for party ID = (1, 2, 3 ... 7)
pidmat <- cbind(1,c(1:7))
exb <- exp(pidmat %*% nes.party$coeff)

#Next code should be the predicted probability, P(C = c|party ID)
a <- (cbind(1,exb)/(1+rowSums(exb)))

matplot(c(1:7),a),
        main="Party ID as a predictor of candidate affinity class",
        xlab="Party ID: strong Democratic (1) to strong Republican (7)",
        ylab="Probability of latent class membership",
        ylim=c(0,1),type="l",lwd=3,col=1)
text(5.9,0.35,"Other")
text(5.4,0.7,"Bush affinity")
text(1.8,0.6,"Gore affinity")

I'm not an R expert, but I believe this is what the code is trying to do. If this were Stata, the margins command makes the sausage for you. In R, I suppose you have to make the sausage yourself.

Related Question