Solved – Complications of having a very small sample in a structural equation model

bootstrapmodelingsample-sizestructural-equation-modeling

I am running a structural equation model (SEM) in Amos 18. I was looking for 100 participants for my experiment (used loosely), which was deemed to be probably not enough to conduct successful SEM. I've been told repeatedly that SEM (along with EFA, CFA) is a "large sample" statistical procedure. Long story short, I didn't make it to 100 participants (what a surprise!), and only have 42 after excluding two problematic data points. Out of interest, I tried the model anyway, and to my surprise, it seemed to fit very well! CFI >.95, RMSEA < .09, SRMR <.08.

The model is not simple, in fact, I would say it is relatively complex. I have two latent variables, one with two observed and the other with 5 observed variables. I also have four additional observed variables in the model. There are numerous relationships between the variables, indirect and direct, with some variables being endogenous to four others, as an example.

I am somewhat new to SEM; however, two individuals that I know who are quite familiar with SEM tell me that as long as the fit indicies are good, the effects are interpretable (as long as they are significant) and there is nothing significantly "wrong" with the model. I know some fit indicies are biased for or against small samples in terms of suggesting good fit, but the three that I mentioned earlier seem fine, and I believe not similarly biased. To test for indirect effects I am using bootstrapping (2000 samples or so), 90 percent bias corrected confidence, monte carlo. An additional note is that I am running three different SEMs for three different conditions.

I have two questions that I would like some of you to consider and please reply to if you have something to contribute:

  1. Are there any significant weaknesses to my model that are not demonstrated by the fit indices? The small sample will be highlighted as a weakness of the study, but I am left wondering if there is some huge statistical problem that I am completely oblivious to. I plan on getting another 10-20 participants in the future, but this will still leave me with a relatively small sample for such analyses.

  2. Are there any problems with my use of bootstrapping given my small sample, or the context in which I am using it?

I hope these questions are not too "basic" for this forum. I have read a number of chapters on SEM and related matters, but I find people are very dispersed in terms of opinions in this area!

Cheers

Best Answer

One point: there is no such thing as a "basic question", you only know what you know, and not what you don't know. asking a question is often the only way to find out.

Whenever you see small samples, you find out who really has "faith" in their models and who doesn't. I say this because small samples is usually where models have the biggest impact.

Being a keen (psycho?) modeller myself, I say go for it! You seem to be adopting a cautious approach, and you have acknowledged potential bias, etc. due to small sample. One thing to keep in mind with fitting models to small data is that you have 12 variables. Now you should think - how well could any model with 12 variables be determined by 42 observations? If you had 42 variables, then any model could be perfectly fit to those 42 observations (loosely speaking), so your case is not too far from being too flexible. What happens when your model is too flexible? It tends to fit the noise - that is, the relationships which are determined by things other than the ones you hypothesize.

You also have the opportunity to put your ego where your model is by predicting what those future 10-20 samples will be from your model. I wonder how your critics will react to a so called "dodgy" model which gives the right predictions. Note that you would get a similar "I told you so" if your model doesn't predict the data well.

Another way you could assure yourself that your results are reliable, is to try and break them. Keeping your original data intact, create a new data set, and see what you have to do to this new data set in order to make your SEM results seem ridiculous. Then look at what you had to do, and consider: is this a reasonable scenario? Does my "ridiculous" data resemble a genuine possibility? If you have to take your data to ridiculous territory in order to produce ridiculous results, it provides some assurance (heuristic, not formal) that your method is sound.