Solved – Bootstrapping in SEM when the original sample size is small

bootstrapstructural-equation-modeling

I'm running a SEM in which I have several very positively skewed endogenous variables. Unfortunately, even when I log transform these variables they are still quite non-normal. Kline (2011) p64 writes that "Some distributions can be so severely non-normal that basically no transformation will work", and so I gave up trying to transform them more.

Kline (2011) provides another option on p177-178:

[An] option for analyzing continuous but severely non-normal
endogenous variables is to use a normal theory method (i.e., ML
estimation) but with nonparametric bootstrapping, which assumes only
that the population and sample distributions have the same shape. In a
bootstrap approach, parameters, standard errors, and model test
statistics are estimated with empirical sampling distributions from
large numbers of generated samples. Results of a computer simulation
study by Nevitt and Hancock (2001) indicate that bootstrap estimates
for a measurement model were generally less biased compared with those
from standard ML estimation under conditions of non-normality and for
sample sizes of N ≥ 200. For N = 100, however, bootstrapped estimates
had relatively large standard errors, and many generated samples were
unusable due to problems such as nonpositive definite covariance
matrices. These problems are consistent with the caution by Yung and
Bentler (1996) that a small sample size will not typically render
accurate bootstrapped results.

In this answer @michael-chernick writes that

The theory of the bootstrap involves showing consistency of the
estimate. So it can be shown in theory that it works for large
samples. But it can also work in small samples. I have seen it work
for classification error rate estimation particularly well in small
sample sizes such as 20 for bivariate data.

Some questions:
1. Those two quotes seem at face value to be in conflict with each other, but I note that @michael-chernick was answering a question that did not involve SEM. Does SEM require a larger original sample size for successful bootstrapping? If so, why?
2. If Kline is right that bootstrapping will work poorly with a low N, what should I do if I have a low N? Say for the sake of argument I have a sample size of 100 and no way to collect more data. Should I go ahead with bootstrapping, and if so should I bootstrap using the transformed variable (remembering that the transformation didn't do a great job of resolving the non-normality) or the original variable?

Kline, R. B. (2011). Principles and practice of structural equation modeling. Guilford publications. Chicago

Yung, Y. F., & Bentler, P. M. (1996). Bootstrapping techniques in analysis of mean and covariance structures. Advanced structural equation modeling: Issues and techniques, 195-226.

Best Answer

Yes, SEM requires a larger size. The reason being that SEM is doing two things: First, it's trying to find a model, and then it finds the standard errors of that model.

There are two problems. The first is that you will have trouble estimating the model(s).

If you have problems with your standard errors (because, say, of non-normality) then bootstrapping might help you. But if you try to run a SEM model with a small sample size, you'll find that you don't get a sensible model to interpret - the model will frequently not converge, or converge with out of bounds estimates (variances < 0; correlations > 1 [perhaps MUCH greater than one - one sometimes sees correlations that are in the three digit range]).

So when you try to bootstrap a model with a small sample size you might find that 25% of the bootstrap samples are clearly wacky and should be discarded. And some proportion of the rest are also wacky, but you don't have a good way to decide which ones. If you did, you could go ahead and use the standard errors.

The second problem is that ML tends to be biased in small samples.