Did you read the original Olsson (1979) paper? I believe it still provides the best description of what polychoric correlations are (although I've probably skimmed only 10% of the existing literature, I have to admit; at some point, it just gets too repetitive of the limited number of ideas though). Polychoric correlations are ML estimates of the correlations of the underlying normal distribution, so you interpret them just as you would Pearson moment correlations with continuous data. Given the ML origins of polychoric correlations, I never understood the advice to use ADF or other least squares methods with them to obtain model parameter estimates, although I do understand that say diagonally weighted least squares (don't know if John Fox implemented them in sem
though), while being less asymptotically efficient, don't need as much auxiliary information for estimation purposes.
There is no magic sample size number, like, you hit 2000 and -- BOOM! -- everything starts working. In my simulations (and I've done a few petaflops this way and that way for my papers), I've seen both cases when asymptotic results worked perfectly fine with $N=200$ and failed to work with $N=5000$. In the most peculiar cases, for the same method and distribution of the underlying data, some asymptotic aspects, such as confidence interval coverage say, would be OK for $N=300$, while others, like $\chi^2$ distribution of a test statistic, would not work until you have $N=1000$. So I am highly skeptical of any sample size advice, and would rather recommend to run a simulation addressing your particular sample size, model complexity and magnitude of the errors. The first paper to bash ADF (Hu, Bentler and Kano (1992)) used an insane degree of overidentification, something like 30 variables in the model, which translates to 400 degrees of freedom, and a sample size of 50. ADF wouldn't even begin to work in these circumstances, as it won't be able to invert the matrix of the fourth moments which will be rank-deficient. And to get 400 degrees of freedom for the test statistic with the sample size below 1000 is a high expectation, too.
So I understand the healthy skepticism that you are demonstrating, but there is simply nothing you can do in your situation about it. Just run polycor
to get the correlation estimates, feed them to sem
, and that would be it -- there is little you can do to produce a much better analysis.
If you were a Stata user, I would immediately recommend gllamm
package, but I am not sure whether a direct analogue of it exists in R.
Regarding your first question, part 1:
Linear regression is "just-identified" in SEM. This is also called "fully-saturated."
A more simple example with 2 IVs and 1 DV gives:
3 variances and 3 covariances in the covariance matrix. This is your DF for SEM = 6.
Your regression includes 2 regression beta coefficients, 2 IV variances, 1 covariance between the IVs (you may or may not realize this is in the model, but it is), and 1 error variance = 6 parameters
6 DF = 6 Parameters
Unless constraints are made, regression models in SEM are always fully saturated and no assessment of model fit is possible.
Regarding Part 2:
I agree with Patrick that these are nested models and you can "test" the constraints with a chi2 test.
Best Answer
I can kill two birds with one stone by explaining how you get an ols software to fit ridge regression for a fixed ridging parameter. You simply augment the data set you have with one observation for each beta you intend to ridge but set the covariate pattern to be all zero except for the ridging variable which gets set to the square root of the ridging parameter.
This includes the intercept. So you need to define your own intercept variable as it will have zeros in the data augmented version.
Also you set the data augmented "response" variable to whatever you want the betas to be shrunk towards. Typically these are all zero.
If you do the maths on the augmented data set, you'll get the ridged beta estimate.
You will also find that the augmented x matrix is now full column rank. I can add the maths if you want but I don't think you need it.
Note that this only helps with understanding for fixed value of the ridging parameter. Choosing a good value is another issue.