Input data assumptions of linear FA (I'm not speaking here about internal assumptions/properties of the FA model or about checking the fitting quality of results).
- Scale (interval or ratio) input variables. That means the items are either continuous measures or are conceptualized as
continuous while measured on discrete quantitative scale. No ordinal data in linear FA (read). Binary data should also be avoided (see this, this). Linear FA assumes that latent common and unique factors are continuous. Therefore observed variables which they load should be continuous too.
- Correlations are linear. Linear FA may be performed based on any SSCP-type association matrix: Pearson correlation, covariance, cosine, etc (though some methods/implementations might restrict to Pearson correlations only). Note that these are all linear-algebra products. Despite that the magnitude of a covariance coefficient reflects more than just linearity in relation, the modeling in linear FA is linear in nature even when covariances are used: variables are linear combinations of factors and thus linearity is implied in the resulting associations. If you see/think nonlinear associations prevail - don't do linear FA or try to linearize them first by some transformations of the data. And don't base linear FA on Spearman or Kendall correlations (Pt. 4 there).
- No outliers - that's as with any nonrobust method. Pearson correlation and similar SSCP-type associations are sensitive of outliers, so watch out.
- Reasonably high correlations are present. FA is the analysis of correlatedness, - what's its use when all or almost all correlations are weak? - no use. However, what is "reasonably high correlation" depend on the field of study. There is also an interesting and varied question whether very high correlations should be accepted (the effect of them on PCA, for example, is discussed here). To test statistically if the data are not uncorrelated Bartlett's test of sphericity can be used.
- Partial correlations are weak, and factor can be enough defined. FA assumes that factors are more general than just loading pairs of correlated items. In fact, there even an advice not to extract factors loading decently less than 3 items in explotatory FA; and in confirmatory FA only 3+ is guaranteed-identified structure. A technical problem of extraction called Heywood case has, as one of the reasons behind, the too-few-items-on-factor situation. Kaiser-Meyer-Olkin (KMO) "sampling adequacy measure" estimates for you how weak are partial correlations in the data relative the full correlations; it can be computed for every item and for the whole correlation matrix. Common Factor analysis model assumes that pairwise partial correlations are enough small not be bothered about and modelled, and they all fall into that population noise for individual correlation coefficients which we don't regard any differently from the sample noise for them (see). And read also.
- No multicollinearity. FA model assumes that all items each posesses unique factor and those factors are orthogonal. Therefore 2 items must define a plane, 3 items - a 3d space, etc:
p
correlated vectors must span p-dim space to accomodate their p mutually perpendicular unique components. So, no singularity for theoretical reasons$^1$ (and hence automatically n observations > p variables
, without saying; and better n>>p
). Not that complete multicollinearity is allowed though; yet it may cause computational problems in most of FA algorithms (see also).
- Distribution. In general, linear FA does not require normality of the input data. Moderately skewed distributions are acceptable. Bimodality is not a contra-indication. Normality is indeed assumed for unique factors in the model (they serve as regressional errors) - but not for the common factors and the input data (see also). Still, multivariate normality of the data can be required as additional assumption by some methods of extraction (namely, maximum likelihood) and by performing some asymptotic testing.
$^1$ ULS/minres methods of FA can work with singular and even non p.s.d. correlation matrix, but strictly theoretically such an analysis is dubious, for me.
Two or three items per factor is a question of identification of your CFA (confirmatory FA) model.
Let us for simplicity assume that the model is identified by setting the variance of each factor to 1. Assume also that there are no correlated measurement errors.
A single factor model with two items has two loadings and two error variances to be estimated = 4 parameters, but there are only 3 non-trivial entries in the variance-covariance matrix, so you don't have enough information to estimate the four parameters that you need.
A single factor model with three items has three loadings and three error variances. The variance-covariance matrix has six entries, and careful analytic examination shows that the model is exactly identified, and you can algebraically express the parameter estimates as functions of the variance-covariance matrix entries. With more items per single factor, you have an overidentified model (more degrees of freedom than parameters), which usually means you are good to go.
With more that one factor, the CFA model is always identified with 3+ items per each factor (because a simple measurement model is identified for each factor, so roughly speaking you can get predictions for each factor and estimate their covariances based on that). However, a CFA with two items per factor is identified provided that each factor has a non-zero covariance with at least one other factor in population. (Otherwise, the factor in question falls out of the system, and a two-item single factor model is not identified.) The proof of identification is rather technical, and requires good understanding of matrix algebra.
Bollen (1989) fully and thoroughly discusses the issues of identification of CFA models in chapter 7. See p. 244 specifically regarding three- and two-indicator rules.
Best Answer
Using factor analysis for scale construction is a bit of an art. It is common to drop items that load to a substantial degree on more than one factor after factor rotation.
That said, a few alternative ideas:
References
You may want to read some of the following articles about factor analysis and scale construction: