Solved – Interpretation of categorical variable level interaction in a linear model/ ANOVA

anovacategorical datainterpretationlmstatistical significance

This is similar to an earlier question which was never answered… Interpretation of interactions between categorical data… and which is much shorter so no problem if you would like to read/answer that instead:

I am doing an experiment where I have varied one parameter (timestep [tt] – factor, 5 levels: hourly, 3hrly, 6hrly, 12hrly, daily), added in a couple of covariates (release location [location], release month [m]) – so far straightforward – and have a response variable (where it starts getting complicated).

I am trying to see how timestep affects the track of a particle which is traced over time, but further than that suggest the maximum timestep which can be used with minimal effect on particle fate. "Particle fate" is a complicated concept which varies in direction and distance after release, so I have created a response variable which is already a moving comparison – i.e. distance from the daily timestep position, per day (e.g. below [day_3]). This allows me to exclude tracking day from the test which gets a bit around the timeseries/autocorrelation issue, but I realise I will need to be careful to remember that I am looking at the same particles at different times and patterns are likely to be lost over time due to the increasing influence of random factors.

I figure an ANOVA/ANCOVA derivative would be an appropriate test since I am looking for differences between sample groups, so I have run an lm in R in order to utilise its factor-level specific ANOVA – here is an example output:

> summary(lm(day_3 ~ tt + m * location))
Call:
lm(formula = day_3 ~ tt + m * location)
Residuals:
    Min      1Q  Median      3Q     Max 
-6.7523 -0.3392 -0.0836  0.2803  2.4548 
Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      10.7602     0.7272  14.796  < 2e-16 ***  
tt3              -0.1765     0.4719  -0.374 0.710065    
tt6              -0.4822     0.4719  -1.022 0.312317    
tt12             -1.8280     0.4719  -3.874 0.000345 ***  
m4                0.3378     0.9438   0.358 0.722087    
m7               -9.1490     0.9438  -9.694 1.36e-12 ***   
m10              -5.8576     0.9438  -6.207 1.54e-07 ***  
location180      -5.2777     0.9438  -5.592 1.25e-06 ***  
location270      -8.6343     0.9438  -9.149 7.82e-12 ***  
location360      -7.3264     0.9438  -7.763 7.64e-10 ***  
m4:location180   -4.8520     1.3347  -3.635 0.000710 ***  
m7:location180    5.5768     1.3347   4.178 0.000133 ***  
m10:location180   4.1863     1.3347   3.137 0.003011 **   
m4:location270   -0.5704     1.3347  -0.427 0.671166    
m7:location270   10.3842     1.3347   7.780 7.21e-10 ***  
m10:location270   8.8500     1.3347   6.631 3.60e-08 ***  
m4:location360   -1.8426     1.3347  -1.381 0.174234    
m7:location360    6.5613     1.3347   4.916 1.22e-05 ***   
m10:location360   4.1764     1.3347   3.129 0.003073 **   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.335 on 45 degrees of freedom
Multiple R-squared:  0.8857,    Adjusted R-squared:   0.84 
F-statistic: 19.37 on 18 and 45 DF,  p-value: 1.724e-15

In this instance, of all the timesteps, you can see that the intercept (hourly) and the tt12 (12hrly) is significant which I would interpret as hourly is different from daily (which would be an intercept of zero), 3hrly & 6hrly are no different from hourly and 12 hrly is different from hourly. However this does not tell me whether 12hrly is different from daily?

I ran the test again with a reordering of levels so that 12hrly is the reference level and got the following output:

> summary(lm(day_3 ~ tt + m * location))
Call:
lm(formula = day_3 ~ tt + m * location)
Residuals:
    Min      1Q  Median      3Q     Max 
-6.7523 -0.3392 -0.0836  0.2803  2.4548 
Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       8.9322     0.7272  12.283 5.69e-16 ***
tt1               1.8280     0.4719   3.874 0.000345 ***
tt3               1.6515     0.4719   3.500 0.001062 ** 
tt6               1.3458     0.4719   2.852 0.006538 ** 
m4                0.3378     0.9438   0.358 0.722087    
m7               -9.1490     0.9438  -9.694 1.36e-12 ***
m10              -5.8576     0.9438  -6.207 1.54e-07 ***
location180      -5.2777     0.9438  -5.592 1.25e-06 ***
location270      -8.6343     0.9438  -9.149 7.82e-12 ***
location360      -7.3264     0.9438  -7.763 7.64e-10 ***
m4:location180   -4.8520     1.3347  -3.635 0.000710 ***
m7:location180    5.5768     1.3347   4.178 0.000133 ***
m10:location180   4.1863     1.3347   3.137 0.003011 ** 
m4:location270   -0.5704     1.3347  -0.427 0.671166    
m7:location270   10.3842     1.3347   7.780 7.21e-10 ***
m10:location270   8.8500     1.3347   6.631 3.60e-08 ***
m4:location360   -1.8426     1.3347  -1.381 0.174234    
m7:location360    6.5613     1.3347   4.916 1.22e-05 ***
m10:location360   4.1764     1.3347   3.129 0.003073 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.335 on 45 degrees of freedom
Multiple R-squared:  0.8857,    Adjusted R-squared:   0.84 
F-statistic: 19.37 on 18 and 45 DF,  p-value: 1.724e-15

I figure this tells me that 12hrly is different from daily also, but then suggests that all other increments of timestep are different from 12hrly.

So my question is… is it appropriate to rehash these tests with each categorical level (timestep) as the reference level to see where they overlap and where they differ, or is there some "better"/easier way to cross-compare the timestep levels. I realise there are other clues in each output and there may be a way to interpret the "estimate" (slope)/ "t value" columns to see whether there is a significant difference between timestep values? If so how do you set a threshold value? What would you recommend in my shoes?

N.B. as an aside, I realise I am using m * location as an interaction – this is intentional and was tested prior to running the lm. And I realise that day of tracking needs to be carefully cross-considered also as they are autocorrelated results.

P.S. Apologies if this is overspecific!

P.P.S THANKS FOR READING ALL THIS!

Best Answer

Just in case others find my question above and were wondering the answer - I finally found it myself when browsing the answers in How to perform an ANCOVA in R and in talking to colleagues.

What I was after was the post hoc Tukey HSD test.

I ran an lm above, but it might as well have been an ANOVA at this stage, the first step is to find that you have a significant difference within your categorical variable [tt above] i.e. a difference in response between levels.

The question asks how the difference between levels can be pulled apart - how do I know which timesteps should be considered as having the same effect, and which are different from each other?

I solved this by following the code in the answer to How to perform an ANCOVA in R as written by Dan Butorovich:

> model.1<-aov(day_1 ~ tt + m + location + m:location)
> Anova(model.1,type="III")
Anova Table (Type III tests)

Response: day_1
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 15.4893  1 281.6862 < 2.2e-16 ***
tt           1.0621  3   6.4386  0.001011 ** 
m            6.6569  3  40.3542 8.159e-13 ***
location     4.6295  3  28.0638 2.198e-10 ***
m:location  26.7061  9  53.9639 < 2.2e-16 ***
Residuals    2.4744 45                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

He also shows the additional option of summary.lm(model.1) which would give you the lm output as in the question above.

Then on to Tukey (which is in the package multcomp):

> posth <- glht(model.1, linfct=mcp(tt="Tukey"))
> summary(posth)

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = day_1 ~ tt + m + location + m:location)

Linear Hypotheses:
            Estimate Std. Error t value Pr(>|t|)   
6 - 12 == 0  0.23058    0.08291   2.781  0.03803 * 
3 - 12 == 0  0.29745    0.08291   3.588  0.00434 **
1 - 12 == 0  0.32954    0.08291   3.975  0.00141 **
3 - 6 == 0   0.06687    0.08291   0.807  0.85099   
1 - 6 == 0   0.09895    0.08291   1.194  0.63398   
1 - 3 == 0   0.03208    0.08291   0.387  0.98006   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

So now I know that on day 1, the effect of varying the timestep on radial distance from the daily timestep track results in all timesteps being different from daily, but hourly, 3hrly, and 6hrly are effectively not different from each other. In the context of my particle tracing study, lowering the timestep makes for a more accurate trajectory but is a computationally intensive process, so knowing that a 6hrly setting is effectively the same as anything down to hourly is a useful finding.

Hurrah! Now I need to just get into the habit of remembering these processes (e.g. the use of post hoc tests) which I do recall practising during undergrad stats lectures, but have totally forgotten the use of.