One person assigns a value to four objects and the values are ordered. If three objects are assigned values, the value to be assigned to the fourth object is predetermined.
Now, this means that for each observation, we have a permutation of {1,2,3,4}.
All possible permutations are 24 for this set. Each permutation can be given an id. This id column will represent all 4 readings for that observation. Now, this id column will replace the four columns in dependent variable and we can regress it using say, logistic model. Number of classes will be 24, so this thing will depend upon what all permutations you have and the number of observations as well. So, depending upon no. of observations and no. of actual permutations present, you can give "id"s accordingly. Now, when you predict the permutation, we will at once know the permutation e.g. by 12 if we mean {2,1,4,3} then if predicted reading is 12, we will at once get the column of readings.
Remember that the difference between significant and non-significant is not (always) statistically significant
Now, more to the point of your question, model 1 is called pooled regression, and model 2 unpooled regression. As you noted, in pooled regression, you assume that the groups aren't relevant, which means that the variance between groups is set to zero.
In the unpooled regression, with an intercept per group, you set the variance to infinity.
In general, I'd favor an intermediate solution, which is a hierarchical model or partial pooled regression (or shrinkage estimator). You can fit this model in R with the lmer4 package.
Finally, take a look at this paper by Gelman, in which he argues why hierarchical models helps with the multiple comparisons problems (in your case, are the coefficients per group different? How do we correct a p-value for multiple comparisons).
For instance, in your case,
library(lme4)
summary(lmer( leg ~ head + (1 | site)) # varying intercept model
If you want to fit a varying-intercept, varying slope (the third model), just run
summary(lmer( leg ~ head + (1 | site) + (0+head|site) )) # varying intercept, varying-slope model
Then you can take a look at the group variance and see if it's different from zero (the pooled regression isn't the better model) and far from infinity (unpooled regression).
update:
After the comments (see below), I decided to expand my answer.
The purpose of a hierarchical model, specially in cases like this, is to model the variation by groups (in this case, Sites). So, instead of running an ANOVA to test if a model is different from another, I'd take a look at the predictions of my model and see if the predictions by group is better in the hierarchical models vs the pooled regression (classical regression).
Now, I ran my sugestions above and foudn that
ranef(lmer( leg ~ head + (1 | site) + (0+head|site) )
Would return zero as estimates of varying slope (varying effect of head by site).
then I ran
ranef(lmer( leg ~ head + (head| site))
And I got a non-zero estimates for the varying effect of head. I don't know yet why this happened, since it's the first time I found this. I'm really sorry for this problem, but, in my defense, I just followed the specification outlined in the help of the lmer function.
(See the example with the data sleepstudy). I'll try to understand what's happening and I'll report here when (if) I understand what's happening.
Best Answer
While this does depend on the theoretical nature of the model, you might want to try using an F-test, where the F statistic is the
variation between sample means/variation between the samples
.This test is used to compare models in order to determine which one can best explain the variation in the dependent variable. You might consider incorporating this test into a one-way ANOVA: Understanding Analysis of Variance (ANOVA) and the F-test
That being said, you mention that you have used three different questionnaires. Be cautious if the number of observations for the three regression models are not equal, in which case the F-test could also be unreliable. e.g. if a model has 100 observations, the F-test could show a lower "fit" than one with 200, but if the number of observations for the first model had been increased, then it could in fact have the best fit.
You could also compute the power of a test for your three samples - i.e. identification of the minimum number of observations that would be needed for your results to be reliable. If your sample for the three models is shown to be large enough, then tests such as the F-test would be more reliable.