Solved – lavaan WLSMV estimator: are results reliable when number of observations is too small to compute Gamma

confirmatory-factorlavaanstructural-equation-modeling

I run CFA (confirmatory factor analysis) with WLSMV estimator (since my data are ordinal) in lavaan and I get the following warning message:

number of observations (190) too small to compute Gamma

Is this a problem with Gamma only and the rest is computed correctly? Can I proceed with results obtained with this warning? E.g. interpret estimates, p-values and fit indices in usual way?

Or maybe this affects somehow (how?) credibility of a whole CFA?

Best Answer

You will want to ensure an adequate sample size when variability of a variable is unequal across the range of values of a second variable that predicts it. If a regression model is able to consistently predict across all values a smaller sample size is possible, where the predictions are poor at one end or the other (because it's ordinal) then a larger sample size is necessary.

A scatterplot of these variables will often create a cone-like shape, as the scatter (or variability) of the dependent variable (DV) widens or narrows as the value of the independent variable (IV) increases. The inverse of heteroscedasticity is homoscedasticity, which indicates that a DV's variability is equal across values of an IV.

Violations of measurement invariance may preclude meaningful interpretation of measurement data. A method of gauging the influence of sample size and model misfit is through the use of the expected parameter change (EPC) statistic. The EPC was developed by Saris, Satorra, and Sorbom (1987) as a means of gauging the size of a fixed parameter if that parameter were freed. More recent work is Kaplan and George (1995), with respect to sample sizes.

Scedasticity and Fit

Sources:

  • "Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares" (July 15 2015), by Cheng-Hsien Li

    "... results showed that WLSMV was less biased and more accurate than MLR in estimating the factor loadings across nearly every condition. However, WLSMV yielded moderate overestimation of the interfactor correlations when the sample size was small or/and when the latent distributions were moderately nonnormal. With respect to standard error estimates of the factor loadings and the interfactor correlations, MLR outperformed WLSMV when the latent distributions were nonnormal with a small sample size of N = 200. Finally, the proposed model tended to be over-rejected by chi-square test statistics under both MLR and WLSMV in the condition of small sample size N = 200.

    ...

    When the weight matrix W is replaced with the identity matrix I, WLS reduces to unweighted least squares (ULS). In order to address heteroscedastic disturbances in CFA models, a full weight matrix W = Ṽ (i.e., the estimated asymptotic covariance matrix of the polychoric correlation and threshold estimates) is implemented in the WLS fit function above to account for distributional variability in and interrelationships among the observed variables (Kaplan, 2009). However, as the number of observed variables and response categories increases, the weight matrix grows rapidly in size.

    ...

    However, WLSMV, for instance, also has its own weaknesses of interfactor correlations and standard errors in estimation when the sample size is small and/or when a latent distribution is moderately nonnormal. Likewise, MLR has its unique strengths—for instance, generally less biased standard error estimates and good recovery of the population interfactor correlations.".

  • "The consequences of ignoring measurement invariance for path coefficients in structural equation models" (Sept 17 2014), by Nigel Guenole and Anna Brown

    "In the CFA approach, which is the focus of the present article, a series of competing models is fitted to response data, where the group membership acts as a potential categorical moderator (e.g., French and Finch, 2011). Equivalent measurement model parameters across groups are required for comparable measurement, a consideration identical to the use of equal measurement scales (say, degrees centigrade) when comparing temperatures in two different regions.

    ...

    The challenge faced by the researcher who allows partial invariance is how much non-invariance can be tolerated whilst still claiming that the same construct is measured across groups or between current and past research. The challenge faced by the researcher ignoring the non- invariance is whether the results of the misspecified model can be trusted. In practice, applied researchers should make a decision based on the expected threats to the validity of their conclusions under each course of action.".



It is a rule-of-thumb to say $\gt$200 samples are necessary for CFA.

Statistical power can be estimated, in order to determine a better minimum sample size than using rule-of-thumb. Use of the robust categorical least squares (cat-LS) methodology for CFA might be better than robust normal theory maximum likelihood (ML), which is used in Lavaan, when the sample size is small (depending upon other parameters).

  • "Applied Psychometrics: Sample Size and Sample Power Considerations in Factor Analysis (EFA, CFA) and SEM in General", (PSYCH Vol.9 No.8 , August 2018), by Theodoros A. Kyriazos:

    Abstract:
    "... this paper reviews the issue of what sample size and sample power the researcher should have in the EFA, CFA, and SEM study. Statistical power is the estimation of the sample size that is appropriate for an analysis. In any study, four parameters related to power analysis are Alpha, Beta, statistical power and Effect size. They are prerequisites for a priori sample size determination. Scale development in general and Factor Analysis (EFA, CFA) and SEM are large sample size methods because sample affects precision and replicability of the results. However, the existing literature provides limited and sometimes conflicting guidance on this issue. Generally, for EFA the stronger the data, the smaller the sample can be for an accurate analysis. In CFA and SEM parameter estimates, chi-square tests and goodness of fit indices are equally sensitive to sample size. So the statistical power and precision of CFA/SEM parameter estimates are also influenced by sample size.".

    Conclusion:
    In CFA and SEM, sample size depends on a number of features like study design (e.g. cross-sectional vs. longitudinal); the number of relationships among indicators; indicator reliability, the data scaling (e.g., categorical versus continuous) and the estimator type (e.g., ML, robust ML etc.), the missing data level and pattern and model complexity (Brown, 2015). Thus, determining sample size is approximated by power analysis (Brown, 2015; Kline, 2016; Byrne, 2012; Wang & Wang 2012). Also, minimum sample sizes are recommended to limit the non-convergence probability to have unbiased estimates or standard errors based on Monte Carlo simulations studies. Generally, CFA/SEM is a large-sample technique (Kline, 2016) but as a rule, models having robust parameter estimates and variables with high reliability may require smaller samples (Tabachnick & Fidell, 2013). Additionally, the issue whether the sample size is adequate for achieving desired power for significance tests, overall model fit, and likelihood ratio tests for specific model/research circumstances is a different aspect considered during power analysis (Hancock & French, 2013; Lee, Cai, & MacCallum, 2012). How Chi-square statistic, RMSEA, and other fit indices perform on different sample sizes levels is another parameter to consider (Hu & Bentler, 1999). Then there is sufficient power is crucial for individual parameter tests like factor loadings (Newsom, 2018). A CFA/SEM rule of thumb is the ratio of cases to free parameters, or N:q is commonly used for minimum recommendations and 10:1 to 20:1 is a commonly suggested ratio (Schumacker & Lomax, 2015; Kline, 2016; Jackson, 2003). Anyhow, even suggestions based on simulation studies are only rough approximations, not equally applicable to all SEM studies. Simulation studies have the potential to study only a fraction of SEM research conditions at a time thus they are not easily generalized (Brown, 2015; Newsom, 2018).

  • "When Can Categorical Variables Be Treated as Continuous? A Comparison of Robust Continuous and Categorical SEM Estimation Methods Under Suboptimal Conditions" (search for non-paywall .PDFs), (Psychological Methods 2012, Vol. 17, No. 3, 354–373), by Mijke Rhemtulla, Patricia E´. Brosseau-Liard and Victoria Savalei:

    "Goals of the Present Study
    Researchers often use continuous methods such as normal theory ML in spite of the variables’ categorical nature. While it is theoretically incorrect to do this, researchers usually work under the assumption that, given a sufficiently large number of categories, categorical variables are sufficiently similar to continuous variables to produce good results. While several studies have explored the question of how many categories are enough to treat categorical variables as continuous, the advent of robust correc- tions for both continuous and categorical estimation warrants a reassessment of this issue. No study has yet compared the performance of continuous and categorical estimation methods with their respective robust corrections, and a thorough investigation of this question will allow researchers to decide which of the most current methods is best for their data.

    The main goal of the present study is to provide this much needed comparison. We compare robust ML, a continuous methodology with corrections for nonnormality that is widely used and performs well under a variety of circumstances, to robust cat-LS, one of the best currently available categorical methodologies (Forero et al., 2009; Yang-Wallentin et al., 2010) that provides correct standard errors and test statistics. A secondary aim of our investigation is to evaluate the relative performance of the two methods in conditions that generally pose difficulties for estimation or violate the underlying assumptions of both methods. To this end, we included a range of conditions including different sample sizes, model sizes, and varying levels of category threshold asymmetry. Additionally, categorical variables were generated by categorizing underlying normal as well as nonnormal distributions. In the condition where the underlying continuous variables are non-normal, cat-LS should also result in biased parameter estimates. The comparison between cat-LS and ML is particularly interesting in this case, as both methods are wrong but one may do better than the other. We compare the relative performance of cat-LS and normal theory ML parameter estimates, the quality of robust standard errors, and the rejection rates of the adjusted test statistics. The results of this investigation will provide an answer to the question of how many categories are enough to treat data as continuous that is sensitive to the characteristics of a particular data set.