According to the OP,
The 'scientific logic' of mass action (which is essentially what this
is) would point to a linear or exponential relationship, and it's
definitely non-linear
We can start by using ggplot to test out the best model. We'll do it purely visually, by plotting the two different options.
First up, we need to get the data in to a data frame. We'll use the usual method:
DP = data.frame(Activity = c(100,67.7,6.29,16.55),
Conc = c(0, 83, 166, 416))
Now, we'll plot a linear fit to the data using ggplot's geom_smooth()
function:
require(ggplot2)
p.dp.1 <- ggplot(data = DP,
aes(x = Conc,
y = Activity)) +
ylab("Activity") +
xlab("Concentration") +
geom_point() +
theme_bw(base_size = 8) +
geom_smooth(method = "lm",
aes(colour = "Linear")) +
scale_color_manual(name = "Fits",
breaks = c("Linear"),
values = c("blue"))
Which gives us this fit:
We can see the standard error (the grey-shaded area) is pretty large.
Because the OP suggested that this might be an exponential relationship, we'll now try adding a fit using an exponential. An exponential curve can be linearized by taking logs of both sides, and then doing a linear fit to the data, which would be very simple with ggplot. However, we have a problem; log(0) is -Inf, so we can't simply take the logs of both sides and do a linear fit. Instead, we have to use glm()
to do the fit, and pass it through geom_smooth()
. We can also take advantage of ggplot's ability to build up plots bit-by-bit, and just tack it on to the previous plot:
p.dp.2 <- p.dp.1 +
geom_smooth(method = "glm",
family = gaussian(link="log"),
aes(colour = "Exponential")) +
scale_color_manual(name = "Fits",
breaks = c("Linear","Exponential"),
values = c("red","blue"))
Which gives us this:
Where we can see that the standard error (the grey bound) is much narrower than the linear fit. So, we have some justification of using the exponential, although it would still be nice to have a chemical model of activity versus concentration to base this on.
We now need the concentration at activity = 50. We can plot this using geom_abline()
:
p.dp.3 <- p.dp.2 +
geom_abline(intercept = 50, slope =0,
aes(colour = "Activity = 50")) +
scale_color_manual(name = "Fits",
breaks = c("Linear","Exponential","Activity = 50"),
values = c("black","red","blue"))
which gives us this:
Our best estimate for 50% activity is the intersection of the black line of activity = 50 and the best-fit line using the exponential model. We can see that this point is slightly to the right of the data point at Conc = 83, and to the left of the line of concentration = 100.
To get the actual intersect, we create the model for activity as a function of concentration outside of ggplot, and approximate the fit using approx()
. The model is set up like this:
dp.model <- glm(formula = Activity ~ Conc,
family=gaussian(link="log"),
data = DP)
Following this example we can get the concentration when activity is 50:
conc.est <- approx(x = dp.model$fitted.values,
y = dp.model$data$Conc, xout=50)$y
and then the concentration is...
> conc.est
89.29575
So, using an exponential model for activity as a function of concentration, we can estimate that concentration = 89.3 when activity = 50. We can see from the figures that the standard error associated with this estimate is huge, though. Unfortunately I can't figure out how to get the error bands for this estimate.
Best Answer
I think smoothing splines with small degrees of freedom would do the trick. Here's an example in R:
The R code: