Solved – Model for exponential decay with lots of zeros

modelnonlinear regressionrtreatment-effect

I am trying to test for the effect of a treatment on a response variable. The response variable decays over time in what I believe is an exponential way. The measurement doesn't go below zero, so there are loads of zeros in the data.

The measurements looks like this

p <- ggplot(dat, aes(time, value, color = treatment)) + geom_point()

enter image description here

I've attempted fitting lm():

dat.lm <- lm(log(value + 1) ~ time + treatment, data = dat)

and the diagnostics are very ordinary, and predictions are poor.

newdat <- expand.grid(treatment = factor(1:2), time = 0:6)
newdat$pred <- predict(dat.lm, newdat)
p2 <- p + geom_line(data = newdat, aes(x = time, y = exp(pred) - 1))

enter image description here

I would like to be able to both test for the significance of the effect of the treatment, as well as estimate parameters for the decay function. I have also had a go with nls(), and a cubic model at least fits better at time == 0, but still not great, and doesn't really make sense.

dat.cubic <- predict(lm(value~time+I(time^2)+I(time^3) + treatment, data = dat), newdat)
newdat$cubic <- dat.cubic
p2 + geom_line(data = newdat, aes(x = time, y = cubic), linetype = "dashed")

enter image description here

To repeat, my main question is how to test the effect of treatment with this data, and secondly, what is the best way to fit the model?

My data:

structure(list(time = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
6), value = c(382.49, 377.13, 422.72, 377.52, 410.48, 435.78, 
399.74, 540.5, 481.29, 455.53, 439.56, 368.63, 421.92, 490.87, 
384.1, 478.92, 327.94, 403.7, 410.99, 332.97, 396.78, 420.85, 
359.82, 474.43, 371.25, 130.92, 84.87, 199.36, 150.84, 112.1, 
111.78, 183.84, 144.01, 163.16, 92.86, 237.2, 172.71, 161.1, 
92.02, 204.17, 183.41, 140, 100.93, 62.52, 152.94, 116.63, 220.2, 
145.3, 168.9, 155.89, 68.3, 44.44, 57.63, 0, 69.72, 48.96, 64.12, 
44.45, 22.9, 40.06, 25.48, 12.94, 30.43, 53.14, 53.16, 31.64, 
18.39, 55.73, 35.15, 44.96, 55.55, 0, 1.36, 30.73, 55.1, 15.71, 
22.66, 0, 26.53, 8.44, 51.93, 0, 0, 0, 0, 10.42, 16.42, 0, 28.17, 
0, 13.21, 48.45, 10.64, 40.52, 10.34, 15.97, 0, 29.81, 21.88, 
36.6, 0, 0, 7.13, 2.84, 12.94, 6.09, 0, 0, 0.54, 11.6, 15.58, 
5.39, 0, 7.67, 0, 0, 5.3, 0, 19.28, 4.12, 0, 9.96, 0, 10.8, 3.48, 
1.98, 1.74, 2.49, 1.79, 0.43, 3.62, 2.37, 0.87, 2.02, 0.27, 1.63, 
1.25, 3.43, 2.73, 1.76, 2.22, 1.6, 1.27, 2.31, 2.3, 0.64, 1.8, 
2.17, 2.52, 0.5, 0.44, 1.02, 1.19, 0.93, 0, 1.48, 0, 1.31, 0, 
0.78, 0.13, 0.9, 0, 1.79, 0.8, 0.21, 0.74, 1.81, 0, 0.98, 0, 
1.59, 0.58, 0.89, 1.48, 379.16, 393.21, 412.4, 426.87, 375.65, 
438.64, 399.23, 454.42, 450.48, 473.15, 473.4, 396.43, 430.82, 
464.84, 370.04, 462.76, 422.85, 435.26, 428.96, 396.91, 460.68, 
485.32, 404.56, 464.46, 475.78, 198.16, 130.74, 38.98, 114.62, 
54.07, 89.64, 74.2, 60.19, 64.38, 70.59, 35.57, 17.4, 105.75, 
67.31, 102.33, 123.3, 131.07, 94.44, 70.1, 62.25, 122.39, 22.49, 
120.74, 63.28, 61.21, 0, 23.05, 32.91, 0, 49.65, 44.3, 3.58, 
20.8, 31.15, 0, 29.53, 36.56, 55.63, 0, 57.8, 4.9, 0, 28.29, 
17.23, 64.23, 4.94, 0, 31.43, 56.98, 6.46, 0, 0, 1.44, 0, 0, 
0.23, 0, 5.83, 0, 0, 7.02, 3.23, 3.52, 2.65, 11.88, 0, 2.63, 
0, 0, 0, 0, 11.29, 5.1, 4.66, 13.05, 3.18, 1.52, 0, 0, 5.07, 
2.15, 0, 2.7, 0, 6.75, 0, 0, 0, 0, 2.49, 0, 0, 0, 0, 0.02, 3.27, 
11.94, 3.89, 3.22, 0, 0.76, 0, 0.68, 1.05, 0, 1.04, 0, 0, 2.04, 
0.86, 0.03, 0, 0.56, 0, 0.03, 0.5, 0, 0, 0.95, 0, 0, 1.2, 0, 
0.23, 0, 0, 0, 0.65, 0.67, 0, 0.87, 0.95, 0, 0.89, 0.91, 0, 0, 
1.74, 1.09, 0, 1.04, 0.21, 0.36, 0, 0, 1.8, 0.04, 0, 0, 0.71), 
    treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
    "2"), class = "factor")), .Names = c("time", "value", "treatment"
), row.names = c(NA, 350L), class = "data.frame")

Best Answer

Provided there is some justification for an exponential decay model, you could try the gnls function from package nlme. This allows you to compare treatments and model variance heterogeneity. Here is something to get you started:

library(ggplot2)
p <- ggplot(DF, aes(time, value, color = treatment)) + geom_point()

Get starting values by fitting separate nls models:

coef(nls(value ~ C * exp(-k*time), data = DF[DF$treatment == 1,], start = list(C=400, k=1)))
#         C          k 
#415.729905   1.080539 

coef(nls(value ~ C * exp(-k*time), data = DF[DF$treatment == 2,], start = list(C=400, k=1)))
#         C          k 
#430.787442   1.606167

Now use gnls:

library(nlme)
fit <- gnls(value ~ C * exp(-k*time), 
            data = DF, 
            params = list(C ~ treatment, k ~ treatment), 
            start = list(C = c(415, 15), k = c(1.1, 0.5)),
            weights = varExp(-0.8, form = ~ time),
            control = gnlsControl(nlsTol = 0.1))

I use an exponential variance structure (without dependence on treatment) here, but you could try some alternatives. Note that I had to strongly increase nlsTol to achieve a successful fit. Use the result with caution (but it looks pretty good):

summary(fit)
#Generalized nonlinear least squares fit
#  Model: value ~ C * exp(-k * time) 
#  Data: DF 
#       AIC      BIC    logLik
#  2485.028 2508.176 -1236.514
#
#Variance function:
# Structure: Exponential of variance covariate
# Formula: ~time 
# Parameter estimates:
#     expon 
#-0.8035687 
#
#Coefficients:
#                 Value Std.Error  t-value p-value
#C.(Intercept) 413.5077 15.976589 25.88210  0.0000
#C.treatment2    9.7849 24.062902  0.40664  0.6845
#k.(Intercept)   1.0932  0.021132 51.73149  0.0000
#k.treatment2    0.4104  0.061657  6.65565  0.0000
#
# Correlation: 
#              C.(In) C.trt2 k.(In)
#C.treatment2  -0.664              
#k.(Intercept)  0.629 -0.418       
#k.treatment2  -0.216  0.456 -0.343
#
#Standardized residuals:
#        Min          Q1         Med          Q3         Max 
#-2.49675931 -0.55858325 -0.02101141  0.53929573  4.36616094 
#
#Residual standard error: 92.79678 
#Degrees of freedom: 350 total; 346 residual

plot(fit)

residual plot

Now plot the result:

newdata <- expand.grid(time = seq(0, 6, length.out = 100), treatment = factor(1:2))
newdata$value <- predict(fit, newdata = newdata)

p + geom_line(data = newdata)

resulting plot

In a next step one could try removing the dependence of C on treatment from the model. That might help to achieve better convergence ...

Related Solutions

Exponential Decay – Fitting with Negative Y Values

Use a selfstarting function:

ggplot(dt, aes(x = x, y = y)) + 
  geom_point() +
  stat_smooth(method = "nls", formula = y ~ SSasymp(x, Asym, R0, lrc), se = FALSE)

fit <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = dt)
summary(fit)
#Formula: y ~ SSasymp(x, Asym, R0, lrc)
#
#Parameters:
#       Estimate Std. Error  t value Pr(>|t|)    
#Asym -0.0001302  0.0004693   -0.277    0.782    
#R0   77.9103278  2.1432998   36.351   <2e-16 ***
#lrc  -4.0862443  0.0051816 -788.604   <2e-16 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 0.007307 on 698 degrees of freedom
#
#Number of iterations to convergence: 0 
#Achieved convergence tolerance: 9.189e-08

exp(coef(fit)[["lrc"]]) #lambda
#[1] 0.01680222

However, I would seriously consider if your domain knowledge doesn't justify setting the asymptote to zero. I believe it does and the above model doesn't disagree (see the standard error / p-value of the coefficient).

ggplot(dt, aes(x = x, y = y)) + 
  geom_point() +
  stat_smooth(method = "nls", formula = y ~ a * exp(-S * x), 
              method.args = list(start = list(a = 78, S = 0.02)), se = FALSE, #starting values obtained from fit above
              color = "dark red")

Solved – Double exponential decay function fitting

If you're not wedded to the particular functional form you specified, it seems that a Ricker model ($y = a x \exp(b x) + \epsilon$) would be a computationally easier choice. Because the model is equivalent to $y=\exp(\log(a) + \log(x) + bx) + \epsilon$, it can be fitted with a log-link Gaussian model using an offset term (need to exclude the PAR==0 data point from the fit, or we could modify it to be some tiny non-zero value):

dd <- data.frame(PAR= c(0,6,18,35,61,93,121,195,268), 
                 ETR = c(0.06, 1.63, 4.88, 7.70, 9.47, 9.47, 8.07, 6.55, 3.60))
glm(ETR ~ offset(log(PAR)) + PAR,
    family=gaussian(link="log"),
    subset=PAR>0,
    data=dd)

Here's what it looks like when done within ggplot2:

library(ggplot2); theme_set(theme_bw())
ggplot(dd, aes(x=PAR,y=ETR))+
    geom_point()+
    geom_smooth(data=subset(dd,PAR>0),
                method="glm",
                formula=y~x + offset(log(x)),
                method.args=list(family=gaussian(link="log")))

If you want to make it a little bit more flexible you can fit $y=a x^c \exp(bx) + \epsilon$ by including $\log(x)$ as a full model term rather than an offset, i.e. ETR ~ log(PAR) + PAR ...

Best Answer

Related Solutions

Exponential Decay – Fitting with Negative Y Values

Solved – Double exponential decay function fitting

Related Question