R – Assumptions for a Hormesis Model

assumptionsdose-responser

I conducted a toxicology study where I exposed fish to 6 concentrations of an effluent treatment for 10 days. At the end of the experiment, I measured 10 endpoints with the fish plasma to determine effects of treatment.

After collecting the data, I ran a Principal Component Analysis on all endpoints. PC1 here is a principal component that is heavily driven by 4 metabolic endpoints that impact growth, fat metabolism, and protein production. When compared to treatment, PC1 appears to show hormesis, a biphasic dose-response to an environmental agent or toxic substance characterized by a low dose stimulation or beneficial effect and a high dose inhibitory or toxic effect. All 4 of the metabolic endpoints I measured follow the same trend as PC1.

To evaluate hormesis, I am using a 5-parameter Brain-Cousens hormesis model* with the drc package in R. It fits the response $R$ as a function of concentration $x$ to the following nonlinear form:

$$R(x) = c + \frac{d-c+fx}{1+\exp[b(\log(x)-\log(e))]} .$$

Most of the parameters represent the concentration-response curve without hormesis: $c$ is the lower asymptote of response, $d$ is the upper asymptote, $b$ represents the steepness of the concentration-response curve, and $e$ is the concentration at the midpoint of the curve. $f$ is the size of the hormesis effect; if $f=0$ then there is no hormesis and there is a standard log-logistic concentration-response curve. Positive values of $f$ represent hormesis.

My model seems to be driven by one value in the highest treatment concentration** but I think the value is likely real, especially since this is likely below the EC50 (based on the way the model works). I hypothesize that the PC1 and 4 endpoint concentrations would continue to decrease (increase in the plot below) if a higher treatment concentration was applied.

The question: How should one handle outliers in this type of modified log-logistic model?

Analysis so far

The Brain-Cousens model in R would not work with my data unless I inverted the values to follow this shape. The "PC1" values are actually inverted here. In reality, PC1 and the 4 endpoints begin to decrease at the 20% treatment. So what I actually see in the raw data is improved (increased concentrations of) metabolic endpoints at the 0.4-5.3% treatments.

I will not be using this as a predictive model, just to describe how our exposure may be affecting the fish. I pasted a figure of the data from the model described. The points are the inverted PC1 values and the line is from the Brain-Cousens model. My code for the model with the below data is:

hm.m2 <- drm(PC1 ~ Treatment, data = data, fct = BC.5()) #BC.5 is Brain-Cousens

summary(hm.m2) #f parameter p-value is 0.0021

modelFit(hm.m2) #lack of fit is not significant

Without 20-3 data point, the $f$ parameter $p$-value in the summary(hm.m2) line is 0.77

Data:

ID#	Treatment	PC1
0.1-1	0.1	0.576
0.1-2	0.1	0.705
0.1-3	0.1	0.518
0.1-4	0.1	1.939
0.4-1	0.4	0.588
0.4-2	0.4	-0.926
0.4-3	0.4	-0.639
0.4-4	0.4	-3.321
0-1	0	-1.312
0-2	0	0.949
0-3	0	0.569
0-4	0	0.615
1.4-1	1.4	-1.679
1.4-2	1.4	-1.557
1.4-3	1.4	-1.273
1.4-4	1.4	-0.994
20-1	20	0.626
20-2	20	0.142
20-3	20	7.039
20-4	20	1.598
5.3-1	5.3	-2.498
5.3-2	5.3	0.704
5.3-3	5.3	-1.480
5.3-4	5.3	-0.889

*If you're interested in looking into the Brain Cousens and other Hormesis models, this website provides the very basic info about how to run the model in R.
There is not too much info about Brain Cousens models online but there is a good deal of info in the scientific literature.
This is the original article describing the model;
Ritz, et al. 2015 describe some dose response models in R; Knezevic, et al 2007. Weed Technology (21): 841 – 848 is another helpful resource for dose response models and is downloadable on google scholar. Lastly, this paper about hormetic responses seems to be freely downloadable

**To test for outliers, I used a rosnerTest() from the EnvStats package on the principal component values. The outlier is ID# 20-3 in the data. I tried to look at Cook's distance but it does not work with my hormesis model output.

Best Answer

The outlier problem is relatively easy to address. There is, however, an additional problem here that illustrates the risks of nonlinear curve fitting.

The drc package provides many useful tools for fitting concentration-response curves. Some tools are designed to handle the apparent outliers that frequently are seen in such data.

With your model for data that you display, you used the default least-squares fitting of mean values at each concentration. That necessarily puts a lot of weight on outliers. You can, however, choose from a variety of robust fitting methods that lessen the influence of apparent outliers on the model. For example, you could choose the robust = "tukey" setting instead to have a model that applies a Tukey biweight to the residuals. Then you would get:

library(drc)
hm.m2t <- drm(PC1 ~ Treatment, data = data, fct = BC.5(),robust="tukey")
summary(hm.m2t)

Model fitted: Brain-Cousens (hormesis) (5 parms)    
Robust estimation: Tukey's biweight     
Parameter estimates:

               Estimate Std. Error  t-value   p-value    
b:(Intercept) -5.762907   4.073403  -1.4148    0.1733    
c:(Intercept)  0.597149   0.106257   5.6199 2.030e-05 ***
d:(Intercept) -1.602817   0.129169 -12.4087 1.469e-10 ***
e:(Intercept)  0.336522   0.047060   7.1509 8.507e-07 ***
f:(Intercept)  0.118482   0.011867   9.9840 5.401e-09 ***

Most of the coefficient values aren't much different from those in the model based on least squares.

The additional problem

All is not well, however. The Brain-Cousens model for hormesis only makes physical sense if both b (related to curve steepness) and f (hormesis) are positive. In this model and in your model they have different signs. The inability to distinguish the value of b from 0 should also be troubling. If b = 0 in the Brain-Cousens model, then there is no log-logistic contribution to the concentration-response curve; the denominator of the second term of the equation has a value of 2 independent of concentration x.

This problem seems to come from the way that the drm() function initializes its search for the coefficient values. This page has examples of what can go wrong if poor starting choices are made in fitting a nonlinear model. You might have found a particular fit of your data to the formula, but the result doesn't make sense in the context of positive hormesis. That's why you thought that you had to "invert the values" to fit the model. You were looking for a positive value of f but didn't recognize that you only got that by trading off for a negative value of b. With nonlinear fits you need to take extra precautions to make sure that your model makes sense.

If you think that a 5-parameter Brain-Cousens model fits your data, then you should be able to add a constant to all of your response values and still get an appropriate fit. In terms of the model parameters, that should just shift the values of c (lower asymptote) and d (upper asymptote) in parallel. Let's do that while putting the PC1 values into their original direction so that this looks more like standard hormesis. The following makes all of the adjusted response values positive.

drData2 <- data
names(drData2)[[3]] <- "PC1adj"
drData2$PC1adj <- -drData2$PC1adj +8
hm.m3t <- drm(PC1adj ~ Treatment, data = drData2, fct = BC.5(),robust="tukey")
summary(hm.m3t)

Model fitted: Brain-Cousens (hormesis) (5 parms)
Robust estimation: Tukey's biweight 
Parameter estimates:

              Estimate Std. Error t-value   p-value    
b:(Intercept) 1.543048   0.240047  6.4281 3.662e-06 ***
c:(Intercept) 4.520486   1.827476  2.4736  0.022981 *  
d:(Intercept) 7.238244   0.082995 87.2130 < 2.2e-16 ***
e:(Intercept) 2.893007   0.350767  8.2477 1.062e-07 ***
f:(Intercept) 2.685588   0.730138  3.6782  0.001597 **

With this adjustment to the response values, both b and f are clearly positive and b is clearly distinguished from a value of 0. I suspect that the initialization with drm() for the nonlinear fitting is designed to work well with the non-negative response values that are typical in this type of work and just didn't perform well with your combination of positive and negative values.

Back to the outlier

Now try using the adjusted data to fit the model with least squares and plot the data and the predictions from both that model and the robust model:

hm.m3 <- drm(PC1adj ~ Treatment, data = drData2, fct = BC.5())
plot(hm.m3t,type="all",xlim=c(0,100),bty="n")
plot(hm.m3,type="none",xlim=c(0,100),col="red",add=TRUE)
legend("bottomleft",legend="black, Tukey\nred, least squares",bty="n")

The least-square and robust models don't disagree too badly at the lower concentrations, but they diverge dramatically at the concentration of 20 and at the higher concentrations where you expect that the trends would continue.

Other issues

Even though the robust fit seems to get around the apparent outlier and you have a "statistically significant" value of f, I'd be hesitant to claim that your model actually documents hormesis.

If you perform ANOVA on the response values at lower concentrations and then do a Dunnett test to examine differences from values at the baseline concentration of 0, you won't find any significant differences. Thus a claim of hormesis would depend very heavily on knowing that the Brain-Cousens model represented your data correctly.

Unfortunately, you don't have data at the higher concentrations that would be needed to document the reliability of that model near the lower asymptote c. I'd particularly worry about this here, as your responses, principal-component values, don't necessarily have the asymptotic association with concentration you would expect for responses like mortality that the Brain-Cousens model was designed to handle.

Furthermore, you barely have more concentrations (0 effluent, plus 5 concentrations) than you have parameters to fit (5). That can lead to overfitting and a model that doesn't describe anything beyond the data at hand, even though you have 4 replicates at each concentration. More data at higher concentrations would seem to be needed.

Related Solutions

Solved – Identifying differences among calibration curves: ANCOVA

If you don't care about the intercept, I think you want this:

> model <- lm(response ~ -1 + conc + day:treat)
> anova(model)
Analysis of Variance Table

Response: response
          Df  Sum Sq Mean Sq F value    Pr(>F)    
conc       1 18.5545 18.5545 62.3458 9.909e-05 ***
day:treat  4  1.7860  0.4465  1.5003    0.2996    
Residuals  7  2.0832  0.2976                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> summary(model)

Call:
lm(formula = response ~ -1 + conc + day:treat)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.5736 -0.4803  0.0222  0.3465  0.6013 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)   
conc             0.6859     0.1929   3.556  0.00927 **
dayday1:treatA   0.6919     0.3693   1.873  0.10316   
dayday2:treatA   0.1093     0.3693   0.296  0.77585   
dayday1:treatB   0.3387     0.3693   0.917  0.38965   
dayday2:treatB   0.7090     0.3693   1.920  0.09636 . 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.5455 on 7 degrees of freedom
Multiple R-squared: 0.9071, Adjusted R-squared: 0.8407 
F-statistic: 13.67 on 5 and 7 DF,  p-value: 0.001695

The -1 in the lm removes the intercept.

There aren't really "slopes" in your model. You just have different days and different treatments. In the results above, you see that the interaction is not significant.

Presumably this isn't your real data. In your real data, you might also want to look at something like this: model2 <- lm(response ~ -1 + conc + day + treat) and model3 <- lm(response ~ -1 + conc + day*treat)

You can visualize this with the following code (pretty hacky)

plot(as.numeric(day),response,col=as.numeric(treat), xlim=c(1,2.2))
points(as.numeric(day)+0.1,pred,col=as.numeric(treat),pch=2)
for(i in 1:6){lines(c(1,1.1), c(response[i],pred[i]),col=as.numeric(treat)[i])}
for(i in 7:12){lines(c(2,2.1), c(response[i],pred[i]),col=as.numeric(treat)[i])}
legend("bottom", c("treatA","treatB","observed","predicted"), col=c(1,2,1,1),
    lty=c(1,1,NA,NA), pch=c(NA,NA,1,2))

I'm still new to stack exchange, so if someone add a comment with a link that shows how to include this plot rather than just the code I'd appreciate it.

EDIT

Based on this code, the following image was produces. You didn't give the pred object so I'm assuming a few things here. Predicted values are smaller and transparent.

pred <- data.frame(treat = df$treat, conc = df$conc, day = df$day, response = predict(mdl, newdata = df[, 1:3]))

library(ggplot2)
ggplot(df, aes(y = response, x = treat, colour = as.factor(conc))) + 
   geom_jitter(position = position_jitter(width = 0.25), shape = 16, size = 3) +
   geom_point(data = pred, aes(shape = as.factor(conc)), alpha = 0.2) +
   facet_grid(~day) + 
   theme_bw()

enter image description here

Best Answer

Related Solutions

Solved – Identifying differences among calibration curves: ANCOVA

Related Question