Mixed-Model – Correct Model Specification in lmer

lme4-nlmemixed modelr

I have scoured lots of help sites and am still confused about how to specify more complicated nested terms in a mixed model as well. I am also confused as the use of : and / and | in specifying interactions and nesting with random factors using lmer() in the lme4 package in R.

For the purpose of this question, let's assume I have accurately portrayed my data with this standard statistical model:
$$
Y_{ijk} = u + \text{station}_i + \text{tow}_{j(i)} + \text{day}_k + (\text{station}\times \text{day})_{ik} + (\text{tow}\times\text{day})_{j(i)k}
$$
station is fixed, tow and day are random. Tow is (implicitly) nested within station.

In other words, I'm hoping that my model includes Station(i,fixed), Tow(j,random,implicitly nested within Station), Day(k,random), and interaction between Tow and Day, and the interaction between Day and Station. I have consulted with a statistician to create my model and at this time believe it to be representative of my data, but will also add in a description of my data for those who are interested at the bottom of my post so as not to clutter.

So far what I've been able to piece together is the following in lmer:

lmer(y ~ station + (1|station:tow) + (1|Day) + (1|station:day) + (1|tow:day), 
     data=my.data)

Does this accurately depict my statistical model? Any suggestions for how to improve my code if it does not read correctly?

I've bolded the specific terms I'm having difficulty specifying in my lmer formula

#1. tow nested within station when tow is random and station is fixed
I'm confused, however about differentiating between nested and interaction terms that are random using : and / . In my above example, I have (1|station:tow) in which I'm hoping reads tow nested within station. I've read conflicting comments on various sites whether or not I should be using : or / here within the random (1|...) format of lmer.

#2. The interaction between station and day when station is fixed and day is random
I then have (1|station:day) but this time I'm hoping it reads the interaction between station and day. It seems like I could use station*day to account for the individual effects of station and day as well as their interaction (rather than including each of the three terms separately as I do above), but I don't see how to specify this when one is fixed and the other is random. Would station*(1|day) do that?

#3. The interaction between tow and day (both random) when tow is nested in station (fixed)
Then lastly, I have (1|tow:day) which I'm hoping reads the interaction of tow and day, but I'm wondering if I need to specify again that tow is nested (implicitly) in station?

I am new to both R and lmer and statistical modeling and greatly appreciate the trouble of thorough explanations in any responses to my questions if possible.$$$$

More details on my data: I am asking whether concentrations of plankton vary across a physical front in the nearshore ocean. I have three stations, inshore, within, and offshore of this front. Station is thus fixed. At each station, I take three replicate plankton tows (from which I sort, count, and get a concentration in terms of # of bugs per meter cubed of water). Tow is random: in three tows I hope to account for the general variability in plankton at that particular station. Tow is intrinsically nested in station as each tow does not have a unique ID (123,123,123 is the ID for tows at each station). I then did this on multiple, independent days with a new front that had formed. I think I can think of Day as a blocking factor? Day is random as repeating this on multiple independent front days is attempting to capture variability from day to day and be representative of all days where this front is present. I want to know about the interaction terms to see if Tows change in variability from day to day and if stations always yield similar data or does it depend on the day?

Again, thank you for your time and help, I appreciate it!

Best Answer

Tow nested within station when tow is random and station is fixed

station+(1|station:tow) is correct. As @John said in his answer, (1|station/tow) would expand to (1|station)+(1|station:tow) (main effect of station plus interaction between tow and station), which you don't want because you have already specified station as a fixed effect.

Interaction between station and day when station is fixed and day is random.

The interaction between a fixed and a random effect is always random. Again as @John said, station*day expands to station+day+station:day, which you (again) don't want because you've already specified day in your model. I don't think there is a way to do what you want and collapse the crossed effects of day (random) and station (fixed), but you could if you wanted write station+(1|day/station), which as specified in the previous answer would expand to station + (1|day) + (1|day:station).

interaction between tow and day when tow is nested in station

Because you do not have unique values of the tow variable (i.e. because as you say below tows are specified as 1, 2, 3 at every station, you do need to specify the nesting, as (1|station:tow:day). If you did have the tows specified uniquely, you could use either (1|tow:day) or (1|station:tow:day) (they should give equivalent answers). If you do not specify the nesting in this case, lme4 will try to estimate a random effect that is shared by tow #1 at all stations ...

One way to diagnose whether you've specified the random effects correctly is to look at the number of observations reported for each grouping variable and see whether it agrees with what you expect (for example, the station:tow:day group should have a number of observations corresponding to the total number of station $\times$ tow $\times$ day combinations: if you forgot the nesting with station, you should see that you get fewer observations than you ought.

Are http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-specification and http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#nested-or-crossed useful to you?