There should be other considerations on your Likert scales than just the number of categories.
Do you offer a neutral category? Compare: "Strongly disagree - Disagree - Agree - Strongly Agree" vs. "Strongly disagree - Disagree - No opinion - Agree - Strongly Agree". The first scale has a by-product of forced response, which may or may not be appropriate.
Do you label them with numbers? If you do, is the neutral category a zero? Compare: "1 Don't like it at all / 2 / 3 / 4 / 5 Like it a lot" vs "-2 Don't like it at all / -1 / 0 / 1 / 2 Like it a lot". The latter one does have the swing from negative to positive attitude, while the former one does not.
If you provide text with the categories, are they equidistant? Netflix 5-star scale sucks, in my opinion: it has only 1 star for "Don't like"s, and between 3 and 5 stars for various degree of "Like"s. Our department teaching evaluations were like that, too: 1 for "Poor", 2 for "Adequate", 3 for "Good", 4 for "Excellent", 5 for "Outstanding", and you basically had to score 4 and above. That's where most of the inconsistencies in validation will likely come from, as the distance between 4 and 5 is not nearly the same as between 1 and 2.
Update: As far as reliability and validity are concerned, I am not sure as to what the standard practices are regarding Likert scales. You can probably present the analysis of both the moment covariance and polychoric correlation matrices, to demonstrate the factor structure, reliability of individual items, and composite reliability of the factor. The moment-based analysis will understate the reliability of the underlying continuous scales, as much work has shown; the polychoric correlations will get these continuous scales right, but that's not what you are measuring. So the true reliability of your measurement process is somewhere in between. You can also demonstrate discriminant validity with these internal measurements (different factors correspond to different concepts, and hence their correlation is less than 1). To demonstrate external validity in its strongest form, you would need some additional behavioral variables coming from a substantive model. E.g., if a certain physical activity is "difficult" to do in the old age, you would expect it to be done "rarely".
Is it a problem from a methodological point of view? Sort of, since strictly speaking, SEM assumes that the observed variables are normally distributed, which a fortiori, your likert items are not.
So what to do? You could hold your nose and pretend that everything is normal, trusting in the Central Limit Theorem. I would probably do that, at least as a preliminary, to see if there's anything going on.
A cleaner solution is to use a SEM method adjusted for likert items. Instead of working with the correlation matrix, these methods treat the likert responses as cut points for an underlying continuous variable, whose correlations one then seeks to estimate. Any time I've done this, all variables have had the same number of likert responses, so I don't know if there's an off-the-shelf package for estimating these correlations with discordant likert items. However it should be possible in principle, and it has probably been done in practice somewhere, by someone. If you are using R, you could check out the user group for package lavaan.
In answer to your final question, of course you report all this. In the Methods section of your paper, you will have described the data you are using, including it's Likertship and other issues. You can then explain how you addressed the difficulty.
EDIT. I did some googling and came up with this. There is software that does polychoric correlations with mixed levels. That's what I would advise. Be aware, however, that you need more subjects for polychoric correlations than you would if you could observe the continuous latent variables directly.
Best Answer
Yes, it is perfectly valid to conduct a Pearson's correlation between variables with different scales. The correlation coefficient is a standardized measure, so it is not influenced by scale.
Here is a small simulation I did. First, I generate data on 7, 9, and 5 point scales. Then, I calculate correlation coefficients between each pair of variables. Then I standardize the variables (so they are all on the same scale), and do the same correlation matrix. As you can see, the correlation matrices are the same.