Solved – Identifying differences among calibration curves: ANCOVA

ancovar

I've got a large number of calibration curves (i.e. instrument response vs. analyte concentration), and I'd like to check whether they are different from each other.

Exactly one curve is associated with each unique combination of the levels of two factors (day and treatment). So, in R, the data look like this:

set.seed(0)
day = gl(2, 6, labels=c("day1", "day2"))
treat = gl(2, 3, 12, labels=c("A", "B"))
conc = rep(0:2, 4)
response = conc + rnorm(12)/2
df <- data.frame(day=day, treat=treat, conc=conc, response=response)

I'd like to know whether one or both of the factors influences the slope of the calibration curve (I don't care so much about the intercept). If they're different, I will use a different calibration curve for each corresponding dataset; if they're not different, I will average them to increase the precision of the measurements that are based on the calibration curves.

Is it correct to use ANCOVA to test for differences among the slopes, using the interaction of day and treat? If so, is the correct R code

model <- lm(response ~ conc + day:treat)

EDIT: To be clear: along with each calibration curve, I have measured data, which I will calibrate using the calibration curve. I am trying to understand whether I should use a separate calibration curve for each timepoint & treatment (which would improve accuracy, if there is an effect of timepoint and/or treatment) or whether it is acceptable to average the calibration curves (which would improve precision). Therefore I'd like to know whether treatment, timepoint, or treatment and timepoint influence the slope of the calibration curve. Actual data as a csv file is here (the number of days will increase in the future).

EDIT2: Thanks to all involved in the conversation below. To clarify, I'm trying to understand what model will tell me whether there are significant differences in the slopes of the response-vs-conc curve, by either day OR treatment OR both. It might be instructive to look at my actual data, posted here, rather than the fake data I generated for simplicity.

require(ggplot2)
df <- read.csv("http://dl.dropbox.com/u/48901983/IntrxnDataSo.csv")
ggplot(df, aes(x=conc, y=response, colour=day)) + 
       geom_point() + geom_line() +
       facet_wrap(~treatment)

calib_by_treatment

So, it looks like there is an effect of day on the slope of the response-vs-conc plot.

ggplot(df, aes(x=conc, y=response, colour=treatment)) + 
       geom_point() + geom_line() +
       facet_wrap(~day)

calib_by_day

It is not as clear that there is an effect of treatment on the slope of the response-vs-conc plot.

int <- ddply(df, .(treatment, day), function(x) coefficients(lm(x$response~x$conc))[1])
names(int)[3] <- "Intercept"
ggplot(int, aes(x=treatment, y=Intercept, colour=day)) + geom_point()

enter image description here

It also looks like there is an effect of treatment on the intercept of the response-vs-conc plot.

Best Answer

If you don't care about the intercept, I think you want this:

> model <- lm(response ~ -1 + conc + day:treat)
> anova(model)
Analysis of Variance Table

Response: response
          Df  Sum Sq Mean Sq F value    Pr(>F)    
conc       1 18.5545 18.5545 62.3458 9.909e-05 ***
day:treat  4  1.7860  0.4465  1.5003    0.2996    
Residuals  7  2.0832  0.2976                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> summary(model)

Call:
lm(formula = response ~ -1 + conc + day:treat)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.5736 -0.4803  0.0222  0.3465  0.6013 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)   
conc             0.6859     0.1929   3.556  0.00927 **
dayday1:treatA   0.6919     0.3693   1.873  0.10316   
dayday2:treatA   0.1093     0.3693   0.296  0.77585   
dayday1:treatB   0.3387     0.3693   0.917  0.38965   
dayday2:treatB   0.7090     0.3693   1.920  0.09636 . 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.5455 on 7 degrees of freedom
Multiple R-squared: 0.9071, Adjusted R-squared: 0.8407 
F-statistic: 13.67 on 5 and 7 DF,  p-value: 0.001695 

The -1 in the lm removes the intercept.

There aren't really "slopes" in your model. You just have different days and different treatments. In the results above, you see that the interaction is not significant.

Presumably this isn't your real data. In your real data, you might also want to look at something like this: model2 <- lm(response ~ -1 + conc + day + treat) and model3 <- lm(response ~ -1 + conc + day*treat)

You can visualize this with the following code (pretty hacky)

plot(as.numeric(day),response,col=as.numeric(treat), xlim=c(1,2.2))
points(as.numeric(day)+0.1,pred,col=as.numeric(treat),pch=2)
for(i in 1:6){lines(c(1,1.1), c(response[i],pred[i]),col=as.numeric(treat)[i])}
for(i in 7:12){lines(c(2,2.1), c(response[i],pred[i]),col=as.numeric(treat)[i])}
legend("bottom", c("treatA","treatB","observed","predicted"), col=c(1,2,1,1),
    lty=c(1,1,NA,NA), pch=c(NA,NA,1,2))

I'm still new to stack exchange, so if someone add a comment with a link that shows how to include this plot rather than just the code I'd appreciate it.

EDIT

Based on this code, the following image was produces. You didn't give the pred object so I'm assuming a few things here. Predicted values are smaller and transparent.

pred <- data.frame(treat = df$treat, conc = df$conc, day = df$day, response = predict(mdl, newdata = df[, 1:3]))

library(ggplot2)
ggplot(df, aes(y = response, x = treat, colour = as.factor(conc))) + 
   geom_jitter(position = position_jitter(width = 0.25), shape = 16, size = 3) +
   geom_point(data = pred, aes(shape = as.factor(conc)), alpha = 0.2) +
   facet_grid(~day) + 
   theme_bw()

enter image description here

Related Question