Tricky problem! Is location fixed or random? Is position fixed or random? I assume that sample is random.
- Since treatment is assigned to location, location is the sampling unit. Basically, the comparison between treatments is done at that level. $n=8$.
- The measurement unit is the observation you take on your "samples" at a given time.
- Location is not nested in treatment. The treatment is applied to the location.
- Position is nested inside location.
- Sample is nested inside position.
- Time is nested inside Sample.
- Time is crossed with treatment.
You have 3 levels of nesting (time within sample, sample within position, position within location).
If location, position and sample are random, I think the R formula will look like this:
Y ~ Treatment * Time +(1|location|position|sample)
You have 1 row in your data frame for each sample observation at each time - with appropriate codings for all of your design characteristics.
Would it work to combine the repeated measures into a score such as their average or their difference? That could make the model easier to interpret.
I will quickly address the general use of aov. When using aov in R, type I sum of squares are used. These are sequential, which means the order of variables will affect the results if the design is unbalanced (see here: http://goanna.cs.rmit.edu.au/~fscholer/anova.php). Type III sum of squares are sometimes preferred when there is an interaction and type II when there is not a significant interaction. This can be done in the car package with the function Anova (notice the capital A). This may be why your anova results did not make sense.
Now to address the question about mixed effect models. I would first recommend lme4, as I think the formula specification is easier to understand. For instance, the random effect would be + (1|animal/time/treatment). In regards to the degrees of freedom, it is not necessarily the case that your model is wrong. Douglas Bates, the author of lme4, has wrote extensively about the difficulties in calculating degrees of freedom in mixed models (https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html). This has also been discussed on this site (getting degrees of freedom from lmer). Because of this, the lme4 package does not provide p-values and, in order to calculate a p-value, extra steps are necessary such as sampling from the posterior. I am not sure if nlme is still being maintained, but it wouldn't hurt to email the authors.
In the event that the model is right, the tricky part will be interpreting the estimates (Interpreting the regression output from a mixed model when interactions between categorical variables are included). The reference category (i.e, the intercept) is going to be the first level of each factor. From what you have provided, this would be the first time point (I assume time is categorical because random effects are always factors), treatment = CON, and genotype = M. The p-value that is significant, for instance, is comparing time to this reference category. The question is whether this is a meaningful comparison? Using a package for Bayesian multilevel models, for instance brms or rstanarm (http://www.r-bloggers.com/r-users-will-now-inevitably-become-bayesians/), you could add posterior estimates together and use simple subtraction to obtain contrasts at each level of the factors.
This might not have been much help towards your initial question, but specification of random effects will generally change the estimate little unless there is great variation between levels of a random effect. Additionally random effects are not always straight forward (Minimum number of levels for a random effects factor?) or easy to define (What is the difference between fixed effect, random effect and mixed effect models?). If you still cannot get an answer to your question about random effects, you can try a sensitivity analysis. For instance, animal ID should be included as a random effect but the others are open to debate. You could check whether the estimates (eg, coefficients and confidence intervals) change drastically by only nesting some of the variables. If they do not, this would provide confidence in your model and you could mention the potential problem with the random effects in the discussion of your paper. For a more rigorous approach, you could use a likelihood ratio test comparing models that differ in regards to random effects (Likelihood ratio tests on linear mixed effect models). You can even use this test to determine whether time is significant. For instance, compare models that differ only in the inclusion of time.
Another option would be to use a gee, generalized estimating equation (r packages: gee & geepack), which might be appropriate here because the correlations between outcomes do not need to be correctly specified. The method is robust to "unknown" correlations. This is also ideal when samples are small (see here: http://epm.sagepub.com/content/76/1/64.short; https://en.wikipedia.org/wiki/Generalized_estimating_equation).
In regards to using different distributions, you could try glmer in the lme4 package with a negative binomial or Poisson distribution. The assumptions of a Poisson distribution are often violated (variance and mean must be close to equal). When there is over dispersion (variance is larger than the mean), the negative binomial distribution is preferred. Since you have 20 potential yes/no's, you should include the number of times possible as an offset which would model the counts as rates.
I hope this information can be of use for the manuscript!
Best Answer
Major update, based on your comment.
The code should be something like when using
lme
fromnlme
:where
rep
is a factor assigning unique codes to each of your 10 independent study sites (i.e., 5 per treatment),season
is a factor indicating measurment quarter, and treatment is the treatment factor.However, this will not give you a real ANOVA but a mixed model with one random effect (
rep
) and two fixed effects (season
andbetween
).To fit a real ANOVA (namely one with one between- and one within-subjects factor, a so called split-plot design) you could use package
afex
:You could run this on each dv separately.
To analyze all dvs together you would need some multivariate analysis of which I am no expert.
Response prior to your comment:
If
treat
is your unit of observation (of which having two seems to be quite low), then the following code would be correct:However, as said, having a repeated measures ANOVA with only two units of observation is pretty uncommon and seems a bad idea. If this is really your design (observed two treatments over several seasons), you are probably better off with other analyses, such as single-case analysis, e.g., here.