Firstly, principal components and factor analysis are quite different methods. PCA is normally used more as a data reduction technique, while factor analysis is more concerned with finding a latent structure.
On the cross loadings, the oblique rotation allows the factors to be correlated, but typically one would not want items to load on multiple factors. In this case, I would probably examine the factor loadings using other oblique rotations such as oblimin to see if these cross-loadings still appear.
Cross loadings of below .3 are often ignored, but if you have multiple samples with the same cross-loadings, then this may be an indication that the item is indeed associated with more than one factor. Typically, these items are discarded, and I would probably do so unless you have a strong theoretical or practical rationale for retaining them.
Finally, it sounds like you have two samples. In this case, I would perform EFA on your first sample, and then use the second sample to validate your model. This will raise the probability that you are modelling something real, rather than noise.
Of course, it's "possible" to do what you're asking. The question is whether or not this is the best way to deal with the issue. You have left out mention of a number of important considerations: first, did you rotate a PCA to create a CFA with 3 factors? That you've noted "cfa" as a keyword, suggests rotation. To me, this means "common factor analysis." Is that correct?
One thing that often gets ignored about unrotated PCA is that it results in a mathematically unique solution where the first factor has been called a "junk" factor by some academics insofar as everything loads on it. Rotation cancels uniqueness by adjusting the loadings across the retained factors to something called "simple structure." The goal of simple structure is that each variable load on a single factor only and be zero (or close to it) for the other factors. Given that, have you examined the first, unrotated PCA component for its value wrt your objective?
Next, factor analysis results in a set of linear combinations that recover a reduced percentage of the total variance. A second, higher-order factor analysis would reduce the recovered variance even more.
Finally, if you want to get really geeky, check out the literature on additive and ultrametric trees for a good discussion of second-order factor analysis. This is not an area that's seen much recent research that I'm aware of but there's a Sage book with this title by James Corter that dates back 25 years or so.
In my opinion, leveraging the first PC would be a safe, easy solution.
Best Answer
One solution to your 1. question is to use cross-validation. You compute classification accuracy for models with different number of components and then pick one with the highest classification accuracy. You can check the references below:
PLS Dimension Reduction for Classification with Microarray Data
Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data
To my experience factor rotation does not improve classification accuracy. Please report your results.