I have some data which I would like to smooth so that the smoothed points are monotonically decreasing. My data sharply decreases and then begins to plateau. Here's an example using R
df <- data.frame(x=1:10, y=c(100,41,22,10,6,7,2,1,3,1))
ggplot(df, aes(x=x, y=y))+geom_line()
What's a good smoothing technique I could use? Also, it'd be nice if I can force the 1st smoothed point to be close to my observed point.
Best Answer
You can do this using penalised splines with monotonicity constraints via the
mono.con()
andpcls()
functions in the mgcv package. There's a little fiddling about to do because these functions aren't as user friendly asgam()
, but the steps are shown below, based mostly on the example from?pcls
, modified to suit the sample data you gave:Now we need to fill in the object that gets passed to
pcls()
containing details of the penalised constrained model we want to fitNow we can finally do the fitting
p
contains a vector of coefficients for the basis functions corresponding to the spline. To visualize the fitted spline, we can predict from the model at 100 locations over the range of x. We do 100 values so as to get a nice smooth line on the plot.To generate predicted values we use
Predict.matrix()
, which generates a matrix such that when multiple by coefficientsp
yields predicted values from the fitted model:This produces:
I'll leave it up to you to get the data into a tidy form for plotting with ggplot...
You can force a closer fit (to partially answer your question about having the smoother fit the first data point) by increasing the dimension of the basis function of
x
. For example, settingk
equal to8
(k <- 8
) and rerunning the code above we getYou can't push
k
much higher for these data, and you have to be careful about over fitting; allpcls()
is doing is solving the penalised least squares problem given the constraints and the supplied basis functions, it's not performing smoothness selection for you - not that I know of...)If you want interpolation, then see the base R function
?splinefun
which has Hermite splines and cubic splines with monotonicty constraints. In this case you can't use this however as the data are not strictly monotonic.