Model an interacton between a categorical IV and a continous moderator (created through a CFA) in a SEM model using the lavaan package in R

categorical dataconfirmatory-factorinteractionlavaanstructural-equation-modeling

How can one go about modelling the interaction between a categorical independent variable and a continous moderator (created through a CFA) in a SEM model using the lavaan package in R?

In particular, in my real dataset I am essentially interested in re-creating a two-way ANOVA in SEM, and also want to include a moderating variable to test with each factor variable.

Example data and problem:

### load packages: ###
library(dplyr)
library(lavaan)
library(car)
library(psychTools)

### Create some data: ###

# Dependent variable: Taken example data from psychTools
DV1 <- bfi$A1 # item 1 
DV2 <- bfi$A2 # item 2
DV3 <- bfi$A3 # item 3

# Moderating variable: Taken example data from psychTools
MOD1 <- bfi$C1 # item 1
MOD2 <- bfi$C2 # item 2
MOD3 <- bfi$C3 # item 3

# Create example factor variables
x1 <- c("A","B")
x2 <- c("C","D")
set.seed(1)
FAC1 <- as.factor(sample(x1, 200, replace = TRUE)) # Factor 1, with two levels "A" and "B"
FAC2 <- as.factor(sample(x2, 200, replace = TRUE)) # Factor 2, with two levels "C" and "D"
FAC12 <- interaction(FAC1,FAC2) # Factor 12, interaction of FAC1 and FAC2, with four levels "A.C" "B.C" "A.D" "B.D"

# Combine to data frame
StudyData <- data.frame(DV1,DV2,DV3, 
                        MOD1,MOD2,MOD3,
                        FAC1, FAC2, FAC12)

### Make all categorical variables numeric for use in SEM (orthogonal contrast coded as in ANOVA): ###

StudyData$FAC1 <- recode(StudyData$FAC1, "c('A')='-1';
                                          c('B')='1'")
StudyData$FAC1 <- as.numeric(levels(StudyData$FAC1))[StudyData$FAC1]

StudyData$FAC2 <- recode(StudyData$FAC2, "c('C')='-1';
                                          c('D')='1'")
StudyData$FAC2 <- as.numeric(levels(StudyData$FAC2))[StudyData$FAC2]

StudyData$FAC12 <- recode(StudyData$FAC12, "c('A.D','B.C')='-1';
                                          c('A.C','B.D')='1'")
StudyData$FAC12 <- as.numeric(levels(StudyData$FAC12))[StudyData$FAC12]

### SEM Model One: ###
Model.one <- '
# cfa
DV =~ DV1 + DV2 + DV3
MOD =~ MOD1 + MOD2 + MOD3
# regressions
DV ~ FAC1 + FAC2 + FAC12
'

Modelone <- sem(Model.one, StudyData, estimator="MLM", effect.coding=TRUE, meanstructure=TRUE) 
summary(Modelone)
fitMeasures(Modelone, c("chisq","cfi","rmsea","srmr","nfi","gfi"))

### SEM Model Two: ###
Model.two <- '
# cfa
DV =~ DV1 + DV2 + DV3
MOD =~ MOD1 + MOD2 + MOD3
# regressions
DV ~ FAC1 + FAC1:MOD + FAC2 + FAC12
'

Modeltwo <- sem(Model.two, StudyData, estimator="MLM", effect.coding=TRUE, meanstructure=TRUE) 
summary(Modeltwo)
fitMeasures(Modeltwo, c("chisq","cfi","rmsea","srmr","nfi","gfi"))

### EDIT ###

### SEM Model Three: ###
Model.three <- '
# cfa
DV =~ DV1 + DV2 + DV3
MOD =~ MOD1 + MOD2 + MOD3
# regressions
DV ~ FAC2 + MOD
'

Modelthree <- sem(Model.three, StudyData, estimator="MLM", effect.coding=TRUE, meanstructure=TRUE, group="FAC1") 
summary(Modelthree)
fitMeasures(Modelthree, c("chisq","cfi","rmsea","srmr","nfi","gfi"))

Model one runs fine. I can run my "ANOVA" in the SEM environment.

However, when I want to run Model two, which includes an interaction term between FAC1 and MOD (as created via CFA in the SEM model), I receive the error:

"lavaan WARNING:
The variance-covariance matrix of the estimated parameters (vcov)
does not appear to be positive definite! The smallest eigenvalue
(= -3.458498e-20) is smaller than zero. This may be a symptom that
the model is not identified."

Questions:

  1. Is it not possible to create a factor:continuous interaction in
    lavaan in this manner?
  2. Are there any work arounds & how to do them? (For example, extract the values calculated during the CFA for MOD and calculate FAC1:MOD interaction outside of the SEM, then re-use the variable in the path analysis (regressions) part of the SEM)
  3. Can Mplus do this without the need for work arounds?

Best Answer

  1. No, the : operator only works on observed variables. It triggers lavaan to actually calculate the product term and include it in the covariance matrix and mean vector to which the model is fitted. That cannot be what happens with a latent variable, which is not part of the observed variables' summary statistics.

  2. Moderation is symmetric, so you could use a multigroup model, with the categorical IV as the grouping variable. Differences in the DV~MOD simple effects across groups would be moderation by FAC1. Differences in the DV~1 intercepts across groups would be the simple effect of FAC1, which could be probed by centering MOD's mean at different values. Or you might be able to use the emmeans utility in the semTools package to probe the interaction; see the ?lavaan2emmeans help-page examples. I suggest another possibility below.

    • Note that using this approach would make the Fac12 variable redundant because Fac2's intercept and effects are likewise moderated by Fac1 by virtue of parameters differing across groups.
  3. Yes, Mplus can simply use LMS estimation, but that is fraught with some restrictive assumptions. My student's PhD research (also here) has revealed that the product-indicator approach is less restrictive, and that can be implemented in lavaan (see her tutorial about using this method for invariance testing, available for download from my faculty page).

Related Question