That's pretty normal.
CFA is a much more stringent criterion than EFA. EFA attempts to describe your data, but CFA tests if the model is correct.
One reason for non-convergence is low average correlations (but then I'd expect RMSEA to be better). The chi-square test is essentially a test that your residuals are equal to zero, and RMSEA, TLI and CFI are transformations of the test.
Fit is always going to be better in a two factor solution than a one factor solution (they're nested).
Some more questions: What was your sample size? What's the average correlation? What's chi-square and df, what's the chi-square of the null model?
Should you add correlated errors? Perhaps, but when you do that you are introducing additional factors. With a fit like this you might need to add a lot, and then you end up with a mess - it's best if they are justified in some way. For example, your second and third items are about intrusive thoughts - that could be a justification.
I actually do not think you have conducted CFAs, as you think you have, for your second and third models. Instead, for a couple of reasons, it reads as though you have just conducted three separate EFAs. For one, you mention the term "orthogonal"--a rotation method type--and factor scores, but rotation and factor scores are only features of EFA, not CFA. And in CFA, if you fit a model specifying three uncorrelated factors, the estimated correlations of those models would in fact be zero, and if the factors were correlated, this specification would worsen the fit of your model.
With that, there is still the question of why your estimated factor correlations and factor score correlations are changing from model to model. You actually have identified the likely cause of these discrepancies yourself:
In particular, in model 2, the 3-factor orthogonal CFA assumes no correlation between factors, yet the factor scores are correlated.
Orthogonal rotation methods assume factors are uncorrelated; orthogonal methods do not make factors uncorrelated (Fabrigar & Wegener, 2011). Thus, when using this rotation method, you could still end up with factors that are correlated when you somehow estimate their correlations (e.g., as you did using factor scores). But if factors are truly correlated, and you assume no correlation, the true shared variance between factors needs to go somewhere, so it ends up getting suppressed back down to the factor loadings (Osborne, 2015). Lay the factor matrix of your orthogonal solution next to the pattern matrix of your oblique solution (i.e., with correlations estimated); I'm willing to bet you will see higher "cross-loadings" with the former than with the latter. Put another way, your orthogonal solution will exhibit worse "simple structure" (Fabrigar & Wegener, 2011).
The end result is that the factor scores from your orthogonal and oblique models are computed using fairly different factor loading estimates, and the orthogonal solution suppresses the correlations between factors. So you shouldn't be surprised that the oblique rotation factor scores show stronger correlations.
The reason your factor score correlations from your oblique solution differ from the estimated factor correlations from the same solution is a bit complicated, but ttnphns comment above is a good summary--the factor scores are only approximations, and therefore their correlations are only approximations, whereas the estimated correlations are based on the unobserved error-free latent variables from the EFA (see DeStefano, Zhu, & Mîndrilă, 2009; Grice, 2001 for more details on the nature of factor scores).
References
DeStefano, C., Zhu, M., & Mîndrilă, D. (2009). Understanding and using factor scores: Considerations for the applied researcher. Practical Assessment Research & Evaluation, 14, 1-11.
Fabrigar, L. F., & Wegener, D. T. (2011). Exploratory factor analysis. New York, NY: Oxford.
Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6, 430-450.
Osbourne, J. W. (2015). What is rotating in exploratory factor analysis? Practical Assessment Research & Evaluation, 20, 1-7.
Best Answer
The need for a correlated residual means that these two items are more closely related than they should be, according to the model. It's also called a "bloated specific" or a "local dependency".
For example, if you had a scale that had questions:
We would expect a residual correlation on the first two items - they are essentially the same item, asked twice. In this case, I'd drop one.