Solved – Interpretation of regression coefficients in latent class regression (using poLCA in R)

interpretationlatent-classmultinomial-distributionrregression coefficients

I am analysing data on symptoms, signs, and autopsy findings (a set of binary Y/N variables), viral serology, for several viruses, and a few other covariates (Age, gender, site) in pigs. After some work, a latent class analysis in poLCA produces a clinically sensible grouping of symptoms, in this case into three classes. This answers our first substantive question – do the symptoms group, and if so, how. The poLCA output is a set of probabilities of being in one of the 3 classes, given any particular value of one of the symptom variables. So far, so good.

The next step is to add covariates to this model.

The relevant poLCA output looks like this:

Estimated class population shares 
 0.4291 0.283 0.2879  
Predicted class memberships (by modal posterior prob.) 
 0.4316 0.2895 0.2789 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fit for 3 latent classes: 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2 / 1 
        Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)    -0.03442     0.25292   -0.136     0.892
VirusCount     -0.78575     0.38272   -2.053     0.046
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3 / 1 
        Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)    -1.88277     0.50915   -3.698     0.001
VirusCount      1.92351     0.55724    3.452     0.001
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
number of observations: 190 
number of estimated parameters: 22 
residual degrees of freedom: 41 
maximum log-likelihood: -572.9144 
AIC(3): 1189.829
BIC(3): 1261.263
X^2(3): 71.56843 (Chi-square goodness of fit)

There is no intrinsic ordering in the three classes. I understand that 2/1 and 3/1 are logits of the relative odds of being in Class 2, or Class 3, compared with Class 1, for pigs with detectable virus (code 1 for detected, 0 for not detected). They are exactly, multinomial regression coefficients. What I am struggling with is how to calculate and report the substantively interpretable parameter estimates.

This paper reports relative risk ratios for covariates in a multinomial regression analysis done (I think) after doing LCA. I'm using poLCA because it permits simultaneous estimation of the LCA model, and the covariates. However the presentation in terms of the estimated risk of being in one class, or the other, compared with the reference class, makes sense to us.

My understanding, which may be wrong, is this – the parameters reported as the PCV2 coefficients are the log-odds of being in that class (class 2 or class 3) as opposed to class 1, given that the pig has detectable virus (VirusCount = 1).

Hence the relevant Odds ratios (which is what my collaborators will understand) is $\exp(-0.78575)$ or about 0.455 and $\exp(1.92351)$ or about $6.8$. I don't understand the intercept terms, and any explanation of these would be welcomed.

There is a fairly long calculation given as an example in the poLCA manual, showing how to calculate the probability of being in the respective classes, which I don't really follow.

pidmat <- cbind(1,c(1:7))
exb    <- exp(pidmat %*% nes.party$coeff)
matplot(c(1:7), (cbind(1,exb)/(1+rowSums(exb))),
  main="Party ID as a predictor of candidate affinity class",
  xlab="Party ID: strong Democratic (1) to strong Republican (7)",
  ylab="Probability of latent class membership",
  ylim=c(0,1), type="l", lwd=3,col=1)
text(5.9,0.35,"Other")
text(5.4, 0.7, "Bush affinity")
text(1.8, 0.6, "Gore affinity")

All advice, and clarifications, would be gratefully received.

Best Answer

My understanding, which may be wrong, is this - the parameters reported as the PCV2 coefficients are the log-odds of being in that class (class 2 or class 3) as opposed to class 1, given that the pig has detectable virus (VirusCount = 1).

Hence the relevant Odds ratios (which is what my collaborators will understand) is 𝑒𝑥𝑝(−0.78575) or about 0.455 and 𝑒𝑥𝑝(1.92351)

or about 6.8. I don't understand the intercept terms, and any explanation of these would be welcomed.

I believe your interpretation of the coefficients as log odds is correct (and if you exponentiate them, you get relative risk ratios). The intercepts presented in your output are the multinomial intercepts.

If you don't know what those intercepts are, they estimate the fraction of people in each class. Backing up, for a 3-class latent class model, i.e. no regression component, the multinomial part of the model looks like this:

$$P(C = 1) = \frac{e^{\gamma_1}}{e^{\gamma_1} + e^{\gamma_2} + e^{\gamma_3}}$$ $$= \frac{1}{1 + e^{\gamma_2} + e^{\gamma_3}}$$ You have to set one class's intercept, i.e. the base class, to 1. More generally,

$$P(C = k) = \frac{exp(\gamma_k)}{\sum_{j=1}^K exp(\gamma_j)}$$ (where i indexes respondents)

Had you fit a latent class regression, it would look something like:

$$P(c_i = k|x_i) = \frac{exp(\gamma_{0k} + \gamma_{1k}x_i)}{\sum_{j=1}^K exp(\gamma_{0j} + \gamma_{1j}x_i)}$$

(in your example, x represents having a detectable viral load; in the poLCA example, it represents the strength of party identification)

So, the coefficients on in your output are the values of $\gamma_{1k}$ for each latent class (k = 2 and 3). The intercepts you referred to are just the multinomial intercepts, i.e. the $\gamma_{0k}$. Had you omitted viral load as a covariate, your model would have calculated those intercepts anyway, it's just that I believe poLCA doesn't display them. If you'd fit the model in Stata (my usual software), the header table would have included the multinomial intercepts, as I described on Statalist.

Edit: Expanding my answer to include predicted probabilities

This second answer centers more around R syntax. The code block that @astaines quoted stems from an example in the poLCA manual, which I'm expanding here:

This block fits a latent class regression to a dataset from the 2000 US Presidential election, using the strength of party identification (1-7 discrete variable, here treated as quasi-continuous) as the predictor. require(poLCA) data(election) f.party <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG, MORALB,CARESB,KNOWB,LEADB,DISHONB,INTELB)~PARTY nes.party <- poLCA(f.party,election,nclass=3,verbose=F)

#next line displays model coefficients
nes.party$coeff

#create a matrix with (what I believe is) the linear predictors in the LCR equation, 
#for party ID = (1, 2, 3 ... 7)
pidmat <- cbind(1,c(1:7))
exb <- exp(pidmat %*% nes.party$coeff)

#Next code should be the predicted probability, P(C = c|party ID)
a <- (cbind(1,exb)/(1+rowSums(exb)))

matplot(c(1:7),a),
        main="Party ID as a predictor of candidate affinity class",
        xlab="Party ID: strong Democratic (1) to strong Republican (7)",
        ylab="Probability of latent class membership",
        ylim=c(0,1),type="l",lwd=3,col=1)
text(5.9,0.35,"Other")
text(5.4,0.7,"Bush affinity")
text(1.8,0.6,"Gore affinity")

I'm not an R expert, but I believe this is what the code is trying to do. If this were Stata, the margins command makes the sausage for you. In R, I suppose you have to make the sausage yourself.

Related Solutions

Solved – Interpretation of Coefficients in linear regression using ‘fitlm’

This could happen if there is correlation among the X variables. For instance, if X4 and X6 are highly correlated, then perhaps the effect of X4 on the response after adjusting for X6 is positive, even if the correlation between X4 and Y is negative.

I believe you can use something like

plotAdded(mdl,'x4')

to look at the relationship between X4 and Y after adjusting for the other predictors.

Solved – Latent class analysis order classes

What you're describing isn't a "problem" per se. Since the three latent classes are unordered, their labeling is completely arbitrary. In any particular run of the poLCA function, it's normal for the labels on the latent classes (1, 2, or 3) to switch around, as they simply depend on what random starting points the poLCA function's estimation algorithm happened to select. As long as each fit achieves the same maximum log-likelihood, the fitted models are all the same, regardless of how the class labels turn out.

For models with covariates, the label-switching will result in different coefficients on the predictor variables (although the class-conditional response probabilities will be the same). This is because the latent class that's used as the baseline for calculating the covariate effects changes. Again, though, the fit of the model is mathematically and substantively the same.

For more information on ordering latent classes, see section 5.6 of the poLCA user’s manual at http://dlinzer.github.io/poLCA/. This section also describes the use of the poLCA.reorder() function for manually reordering the latent classes in your model. If you are primarily interested in each observation's posterior class membership probabilities, see section 5.3 of the user's manual and the posterior element of the estimated poLCA model object.

Best Answer

Related Solutions

Solved – Interpretation of Coefficients in linear regression using ‘fitlm’

Solved – Latent class analysis order classes

Related Question