Solved – Hypothesis testing with control and treatment group – which statistical analysis to use

anovahypothesis testingmixed modelrstatistical significance

I would like some advice on what statistical analysis to use to test my hypothesis. I am using R. My setup is as follows (in .csv format, first line is header):

id,time,group,outcome
1,t1,control,3.4
1,t2,control,3.2
2,t1,treatment,3.4
2,t2,treatment,4.2
3,t1,control,3.3
3,t2,control,3.1
4,t1,treatment,3.2
4,t2,treatment,4.5

The hypothesis I want to test is if a treatment (say the effectiveness of an exercise, E) has increased the 'outcome' variable significantly. Normally I would just use a paired t-test for the 'treatment' group to see if the difference (increase) is significant, since I took repeated measure from the same subject (i.e. pre and post test). However, I have found out that there exists a 'practice effect' in the way I measure the outcome, causing the 'outcome' variable to decrease between measurements, without any intervention of the exercise E. Therefore I would like to account for this practice effect by having a separate 'control' group. Basically, I want to find out if the 'outcome' variable has increased significantly between t1 and t2 for the 'treatment' group, taking into account the practice effect measured in the 'control' group. Conceptually (sorry – I may not be using the correct technical terms here), say the 'outcome' variable increased from M1=3.2 to M2=3.4 in the treatment group. And decreased from M1=3.2 to M2=3.1 in the control group. I want to account for this by adding this 0.1 decrease to the 0.2 increase of the treatment group, and test if it is significant.

I have been researching this myself and read somewhere that I could use the Factorial ANOVA, but not sure how do I find the p-value for what I want to test. I have also read that I can use the Linear Mixed Model (lme) with id as random effect, but am totally confused by this as well. I am a newbie in R and semi-newbie in statistics.

Any help or advice would be very much appreciated! Thank you very much.

Best Answer

I assume that t1 can be encoded as zero time, i.e., there is no treatment or training effect at this time point (i.e., these are "baseline measurements"). Thus:

DF$time1 <- as.numeric(DF$time == "t2")
#  id time     group outcome time1
#1  1   t1   control     3.4     0
#2  1   t2   control     3.2     1
#3  2   t1 treatment     3.4     0
#4  2   t2 treatment     4.2     1
#5  3   t1   control     3.3     0
#6  3   t2   control     3.1     1
#7  4   t1 treatment     3.2     0
#8  4   t2 treatment     4.5     1

library(ggplot2)
ggplot(DF, aes(x= time1, y = outcome, color = group, group = id)) +
  geom_line()

resulting plot

Now we fit a mixed effects model that consists of:

  • a fixed intercept,
  • a fixed slope (your training effect),
  • an interaction of that slope with the treatment,
  • a random intercept to account for repeated measures.

Judging from the plot above you should also include random slopes, but you don't have sufficient data for that. I hope in reality you have more data (many more individuals).

library(lmerTest)
mod <- lmer(outcome ~ time1 + time1:group + (1 | id), data = DF)
plot(mod) #not really enough data to judge normality or variance homogeneity, 
          #but no obvious violation
summary(mod)
#Linear mixed model fit by REML 
#t-tests use  Satterthwaite approximations to degrees of freedom ['lmerMod']
#Formula: outcome ~ time1 + time1:group + (1 | id)
#   Data: DF
#
#REML criterion at convergence: -3.9
#
#Scaled residuals: 
#    Min      1Q  Median      3Q     Max 
#-1.2048 -0.5522  0.1004  0.6024  1.2048 
#
#Random effects:
# Groups   Name        Variance  Std.Dev. 
# id       (Intercept) 1.822e-18 1.350e-09
# Residual             1.550e-02 1.245e-01
#Number of obs: 8, groups:  id, 4
#
#Fixed effects:
#                     Estimate Std. Error       df t value Pr(>|t|)    
#(Intercept)           3.32500    0.06225  5.00000  53.414 4.35e-08 ***
#time1                -0.17500    0.10782  5.00000  -1.623 0.165498    
#time1:grouptreatment  1.20000    0.12450  5.00000   9.639 0.000204 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Correlation of Fixed Effects:
#            (Intr) time1 
#time1       -0.577       
#tm1:grptrtm  0.000 -0.577

As you see, the model would tell you that your "practice effect" (slope time1) is not significant but the treatment has a highly significant effect on the slope. I believe this is what you want to know.

(The random intercept variance is very small because there is no random slope although one is needed. This is a problem. However, I'd be confident in the conclusion as it is already obvious when plotting the data.)

Related Question