It has the same meaning as any other confidence interval: under the assumption that the model is correct, if the experiment and procedure is repeated over and over, 95% of the time the true value of the quantity of interest will lie within the interval. In this case, the quantity of interest is the expected value of the response variable.
It is probably easiest to explain this in the context of a linear model (mixed models are just an extension of this, so the same ideas apply):
The usual assumption is that:
$y_i = X_{i1} \beta_1 + X_{i2} \beta_2 + \ldots X_{ip} \beta_p + \epsilon $
where $y_i$ is the response, $X_{ij}$'s are the covariates, $\beta_j$'s are the parameters, and $\epsilon$ is the error term which has mean zero. The quantity of interest is then:
$E[y_i] = X_{i1} \beta_1 + X_{i2} \beta_2 + \ldots X_{ip} \beta_p $
which is a linear function of the (unknown) parameters, since the covariates are known (and fixed). Since we know the sampling distribution of the parameter vector, we can easily calculate the sampling distribution (and hence the confidence interval) of this quantity.
So why would you want to know it? I guess if you're doing out-of-sample prediction, it could tell you how good your forecast is expected to be (though you'd need to take into account model uncertainty).
For the first question, the default method in SAS to find the df is not very smart; it looks for terms in the random effect that syntactically include the fixed effect, and uses that. In this case, since trt
is not found in ind
, it's not doing the right thing. I've never tried BETWITHIN
and don't know the details, but either the Satterthwaite option (satterth
) or using ind*trt
as the random effect give correct results.
PROC MIXED data=Data;
CLASS ind fac trt;
MODEL y = trt /s ddfm=satterth;
RANDOM ind /s;
run;
PROC MIXED data=Data;
CLASS ind fac trt;
MODEL y = trt /s;
RANDOM ind*trt /s;
run;
As for the second question, your SAS code doesn't quite match your R code; it only has a term for fac*ind
, while the R code has a term for both ind
and fac*ind
. (See the Variance Components output to see this.) Adding this gives the same SE for trt
in all models in both Q1 and Q2 (0.1892).
As you note, this is an odd model to fit as the fac*ind
term has one observation for each level, so is equivalent to the error term. This is reflected in the SAS output, where the fac*ind
term has zero variance. This is also what the error message from lme4 is telling you; the reason for the error is that you most likely misspecified something as you're including the error term in the model in two different ways. Interestingly, there is one slight difference in the nlme model; it's somehow finding a variance term for the fac*ind
term in addition to the error term, but you will notice that the sum of these two variances equal the error term from both SAS and nlme without the fac*ind
term. However, the SE for trt
remains the same (0.1892) as trt
is nested in ind
, so these lower variance terms don't affect it.
Finally, a general note about degrees of freedom in these models: They are computed after the model is fit, and so differences in the degrees of freedom between different programs or options of a program do not necessarily mean that the model is being fit differently. For that, one must look at the estimates of the parameters, both fixed effect parameters and covariance parameters.
Also, using the t and F approximations with a given number of degrees of freedom is fairly controversial. Not only are there several ways to approximate the df, some believe the practice of doing so is not a good idea anyway. A couple words of advice:
If everything is balanced, compare the results with the
traditional least squares method, as they should agree. If it's
close to balanced, compute them yourself (assuming balance) so that
you can make sure the ones you're using are in the right ballpark.
If you have a large sample size, the degrees of freedom don't
matter very much as the distributions get close to a normal and
chi-squared.
Check out Doug Bates's methods for inference. His older method is
based on MCMC simulation; his newer method is based on profiling the
likelihood.
Best Answer
Try this, it's a standard way to do a split plot. The notation
/
means that method is nested in day.