Solved – Specifying a model with nested subsamples within split-plot design

linear modelnested datasplit-plotsubsampling

I am trying to specify a model for split plot design that acknowledges nested sub-sampling. Split plot designs are a little bit tricky to analyze, and I am new to R, so I provide my dataset along with instructions at the bottom in case anyone can solve this challenge.

THE CONTEXT: In my MSc thesis, I am using a split plot design to test the effect of forest type ("Riparian" vs "sloped") and forest age (250 years/"old" vs 35 years/"young") on the biomass of salal, an understory plant of economic interest. The experimental design is similar to a two way factorial ANOVA. However, each old forest site is paired with a young forest site; before logging occurred they were within the same patch of forest or "historicstand". This pairing means that it is a split plot design (see figure). It took me some effort to learn how to analyze this split plot design, but now I understand it, and I can run this simple model fine in R base stats.

This model in R is:

aov(salalbio ~ foresttype + Error(historicstand/foresttype) + forestage+ forestage*foresttype, data = antfp)

To conceptualize the experimental design, this shows the split plot layout. However, my challenge is that at each X there are two measures that are dependent
THE CHALLENGE:The problem with the above model is that it does not recognize that within each site I have two measurements: At each site there are two sampling plots "sample_plots" where we measured salal biomass. These measurements are not replicates so I need to acknowledge that they are nested within site, or I risk a Type I error. How do I rewrite the model so that it acknowledges that sample plot is nested within Siteid? In base R[stats] would be easiest for me to comprehend, but any package would do.

fyi, I could just average the two measures by site and analyze it this way but my sample size is very small (n = 3 sites per each experimental level). When biomass is averaged I do not detect the effect. Analyzing the nested sub samples maybe, just maybe, will reveal an effect.

Here is the data:

saldata <- structure(list(Site.id = structure(c(1L, 1L, 4L, 4L, 2L, 2L, 
5L, 5L, 3L, 3L, 6L, 6L, 7L, 7L, 10L, 10L, 8L, 8L, 11L, 11L, 9L, 
9L, 12L, 12L), .Label = c("ABOG1", "ABOG3", "ABOG4", "ABSG1", 
"ABSG3", "ABSG4", "ROG1", "ROG2", "ROG3", "RSG1", "RSG2", "RSG3"
), class = "factor"), foresttype = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("riparian", "sloped"), class = "factor"), 
 forestage = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L
 ), .Label = c("old", "young"), class = "factor"), historicstand = c(1L, 
 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L), sample_plot = structure(c(1L, 
  2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("a", "b"), class = "factor"), 
 salalbio = c(26.7, 177, 100, 210, 240, 296.7, 410, 71.7, 
 253.3, 256.7, 93.3, 233.3, 106.7, 36.7, 0, 0, 86.7, 0, 20, 
 0, 233.3, 206.7, 115, 0)), .Names = c("Site.id", "foresttype", 
 "forestage", "historicstand", "sample_plot", "salalbio"), class = "data.frame", row.names = c(NA, -24L))

The below model (for split plot) needs to be modified so that sample plot "a" and "b" are recognized as coming from the same Siteid:

M1 <- aov(salalbio ~ foresttype + Error(historicstand/foresttype) + forestage+ forestage*foresttype, data = saldata)

Best Answer

Ok, let's focus on some issues I can find here. As I already stated on the comments, the 'tracks' present in this arrangement characterize a split-block design with three replicates in each combination A x B. Pay attention to the term replicate, I will go back to it later. So, if you want to analyze your experiment in R, assuming you have a balanced data set (this is important too), the model should be:

anova(lm(var ~ Block + type + Block/type + age + Block/age + type:age, data=dt.sb))

Fine. Problem solved, let's discuss it, right? Wrong! This arrangement is valid if the whole blocks are repeated. So, you should have at least three sets of that combination type X age to use this design. If you consider each of the replicates (remember I said I would go back to it) within the combinations as a repetition, you might be falling into the pseudo-repetition problem, besides the fact it does not meet the requirements for the model described above.

Also be aware you have three residuals in this analyses, and by consequence three coefficients of variation. This experimental design is not very encouraged because of its lack of randomization and the pseudo-repetition problem is very likely to be pointed out as an important issue.