In general, multiple group CFA is a good tool for comparing the equivalence of a measurement model across two groups.
However, you first need to show that the measurement model makes sense in at least one group.
First, analyse the entire sample
There are many different ways of tackling factor exploration. Here are some thoughts
- Consider analysing the factor structure of the entire sample of 180.
- If the confirmatory CFA is giving poor fit, you might want to run an exploratory factor analysis to see what is going on in the data. Are there items loading on factors other than theorised? Do the number of factors theorised seem reasonable relative to factor structures you get with a few more or a few less factors?
- Also, it is not uncommon for item-level CFAs with large numbers of items per factor to yield relatively poor fit. This can be explained both in terms of how we think about fit statistics in such contexts, and also in terms of the simplicity of many measurement models. For example, it is quite common for there to be a number of systematic deviations from the idealised structure (e.g., items with common words correlating more highly; items close together correlating more highly; items that both negatively worded correlating more highly; etc.). Without incorporating these, dare I say, nuisance characteristics, fit will often be poor even when the theorised structure is a reasonable approximation.
- Some people adopt an item parcelling approach to CFA that often smooths out some of these low-level item characteristic effects. Note that there is some controversy over the appropriateness of this approach. Item parcelling is often appealing to researchers, perhaps wrongly, because it can improve fit statistics to satisfy conventional criteria and therefore increase the ability to get published. A more reasonable justification for item-parcelling is in situations where you are not interested in the item-level and are only interested in more general characteristics of the scale. Anyway, in your case, you might want to examine item parcelling as it also reduces the number of parameters that need to be estimated.
- With regards to sample size, I agree that more would be better, but my sense is that you'll still be able to get interesting results with 180. You can use fit statistics with confidence intervals to quantify associated uncertainty due to the smaller sample size.
Multiple group CFA
So, if, and only if, you can get a good model at the overall level would I proceed to multilple group CFA.
The intercept or mean of a latent variable is arbitrary, like the variance, and is usually fixed to zero if you have a single group model (or a single time point model). The intercept of the measured variable is the expected value when the predictor (the latent variable) is equal to zero.
You anchor the mean of the latent variable to the intercept of the measured variables, and that means that you can compare them over time. But if the intercepts of the measured variables drift apart, you can't anchor the means to them any more, because you don't know where they are anchored.
Enough analogies, let's have a concrete example.
Let's say you want to compare depression symptoms in men and women.
So you ask three questions:
How many days in the past week have you:
- Felt lonely.
- Felt sad
- Cried
I create a latent variable based on this, and error and loadings look good. Now I want to compare the means of the latent variables, so I fix the male latent mean to zero. I constrain the intercepts of the three measured variables to be equal across groups.
Women and men do not differ on how much they have felt lonely, how much they have felt sad, but then we find that women say that they have cried more than men.
Does that mean that the women have 'more' depression than the men? If we anchor to crying - yes. If we anchor to the other two variables - no. We don't have intercept invariance, and because of that, we can't compare the means of the latent variables.
Another (only slightly different) way to think about it. The intercept of the measured variable is the expected value of the variable if the mean of the factor is equal to zero. The predicted values for the measured variables should be the same between men and women when the values of the factors are equal (that is, when the value of the factors is zero). But the predicted values of the measured variables are not equal when the factors are equal. Some are equal (in our example, 1 and 2), one is not (3).
Best Answer
There are two questions here: (1) do you need to fit/examine a measurement model before fitting/examining a structural model? and (2) if a given measurement model exhibits poor fit, is it still worth fitting/examining a structural model.
References
Little, T. D. (2013). Longitudinal structural equation modeling. New York, NY: Guilford Press.
Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling, 9, 151-173.