Solved – Paired samples t-test using a structural equation modeling approach

mplusstructural-equation-modelingt-test

Is it possible to perform a paired samples t-test in a structural equation modeling (SEM) program? I am confused, as I have found on the web that you can run a Wald test, but I am not sure if it is the same as a t-test.

Best Answer

Many statistical tests can be thought of as structural equation models, and one of those is the paired samples t-test.

As you say, the advantage of the SEM approach is that you can use FIML estimation - which is asymptotically equivalent to multiple imputation, but can be easier.

You estimate a parameter which represents the difference between the means, and test it against zero.

  1. Use the standard error and a t-test.
  2. Use the likelihood ratio (chi-square) difference test.
  3. Use a Wald test.

Demonstration of each of these, using Lavaan.

#First, generate some data
library(MASS)
library(lavaan)

d <- as.data.frame(mvrnorm(S=matrix(c(1, 0.8, 0.8, 1), 2), mu=c(0, 0.1) , n=200,  empirical=TRUE))
names(d) <- c("y1", "y2")

Here's approach 1. Fit a model where the variables are correlated, and the means are estimated and named. Then find the difference between the means. You get an estimate, a standard error, a t-value and a p-value.

model.1 <- "y1 ~~ y2
        y1 ~~ y1
        y2 ~~ y2
        y1 ~ m1 * 1
        y2 ~ m2 * 1
        diff := m2 - m1
"
fit.1 <- lavaan(model.1, data=d)
summary(fit.1)

Here's the bit of the output we want:

                   Estimate  Std.err  Z-value  P(>|z|)
Defined parameters:
    diff              0.100    0.045    2.242    0.025

In Mplus, you'd write (note - untested code, written from memory):

Model: 
y1 with y2;
[y1] (m1);
[y2] (m2);
model constraint: 
new diff;
diff = m1 - m2;

The second approach is the likelihood ratio (chi-square) difference test. This is straightforward, because our current model has zero df, and the chi-square is the difference between the two models. Now the chi-square test gives you the p-value.

Estimator                                         ML
Minimum Function Test Statistic                4.963
Degrees of freedom                                 1
P-value (Chi-square)                           0.026

Note that this is not exactly the same p-value, the chi-square test assumes a large sample size. But it's very close (and will be down to about 30). This approach is less useful, because you don't get things like a standard error of the difference. However, if you want to get a multivariate test p-value, this approach works.

In Mplus, you'd write:

model constraint: 
m1 = m2;

The third approach is the Wald test. This is the least useful. It assumes that all other parameters are unchanged by a restriction in the model (it's kind of the opposite of a modification index, or a lagrange multiplier in that way - it's the estimate of what the chi-square difference test would be). We use the lavTestWald (fit.1, "m1==m2" ) function in Lavaan.

lavTestWald (fit.1, "m1==m2" )
$stat
[1] 5.025126

$df
[1] 1

$p.value
[1] 0.02498211

In Mplus, you'd replace the model constraint section with a model test section:

model test: 
m1 = m2;

Notice that the Wald test chi-square is a touch higher, and hence the p-value a touch lower than in the likelihood ratio test (just like logistic regression). The advantage of the Wald test is that you don't have to reestimate the model, so it's faster, but that's rarely an issue these days.

OK, that's cool, but we wanted to use FIML. First, let's introduce some missing data to get some bias. If a person has a score on y1 which is greater than 1, they have a 33% chance of being missing on y2; if they have a score on y2 which is less than -1, they have a 33% chance of being missing on y1.

set.seed(12345)
d$y1a <- ifelse(d$y2 < -1 & runif(nrow(d)) > 0.66, NA, d$y1)
    d$y2a <- ifelse(d$y1 > 1 & runif(nrow(d)) > 0.66, NA, d$y2)

A t-test estimates the difference at 0.14 - approximate a 40% overestimate.

When I run lavaan using:

fit.1a <- lavaan(model.1a, data=d, missing="ML")
summary(fit.1a)

Defined parameters:
diff              0.108    0.046    2.343    0.019

The overestimate is only 8%.

It's also possible to fit this as a multilevel model, which gives a full information estimate:

library(lme4)
d.long <- as.data.frame(c(d$y1a, d$y2a))
names(d.long) <- "y"
d.long$id <- rep(1:200, 2)
    d.long$x <- c(rep(0, 200), rep(1, 200))
summary(lmer(y ~ x + (1|id), data=d.long))


Fixed effects:
        Estimate Std. Error t value
(Intercept) -0.01206    0.07174  -0.168
x            0.10695    0.04589   2.330