R – Finding Inflection Points from Smoothed Data

loessrsmoothing

I have some data that I smooth using loess. I'd like to find the inflection points of the smoothed line. Is this possible? I'm sure someone has made a fancy method to solve this…I mean…after all, it's R!

I'm fine with changing the smoothing function I use. I just used loess because that's what I was have used in the past. But any smoothing function is fine. I do realize that the inflection points will be dependent on the smoothing function I use. I'm okay with that. I'd like to get started by just having any smoothing function that can help spit out the inflection points.

Here's the code I use:

x = seq(1,15)
y = c(4,5,6,5,5,6,7,8,7,7,6,6,7,8,9)
plot(x,y,type="l",ylim=c(3,10))
lo <- loess(y~x)
xl <- seq(min(x),max(x), (max(x) - min(x))/1000)
out = predict(lo,xl)
lines(xl, out, col='red', lwd=2)

enter image description here

Best Answer

From the perspective of using R to find the inflections in the smoothed curve, you just need to find those places in the smoothed y values where the change in y switches sign.

infl <- c(FALSE, diff(diff(out)>0)!=0)

Then you can add points to the graph where these inflections occur.

points(xl[infl ], out[infl ], col="blue")

From the perspective of finding statistically meaningful inflection points, I agree with @nico that you should look into change-point analysis, sometimes also referred to as segmented regression.

Example

First let's generate some interesting data. They are stored in two parallel arrays, times and x (the binary response).

set.seed(17)
n <- 300000
times <- cumsum(sort(rgamma(n, 2)))
times <- times/max(times) * 25
x <- 1/(1 + exp(-seq(-1,1,length.out=n)^2/2 - rnorm(n, -1/2, 1))) > 1/2

Here is the running mean applied to the full dataset. A fairly sizable window half-width (of $1172$) is used; this can be increased for stronger smoothing. The kernel has a Gaussian shape to make the smooth reasonably continuous. The algorithm is fully exposed: here you see the kernel explicitly constructed and convolved with the data to produce the smoothed array y.

k <- min(ceiling(n/256), n/2)  # Window size
kernel <- c(dnorm(seq(0, 3, length.out=k)))
kernel <- c(kernel, rep(0, n - 2*length(kernel) + 1), rev(kernel[-1]))
kernel <- kernel / sum(kernel)
y <- Re(convolve(x, kernel))

Let's subsample the data at intervals of a fraction of the kernel half-width to assure nothing gets overlooked:

j <- floor(seq(1, n, k/3)) # Indexes to subsample

In the example j has only $768$ elements representing all $300,000$ original values.

The rest of the code plots the subsampled raw data, the subsampled smooth (in gray), a lowess smooth of the subsampled smooth (in red), and a lowess smooth of the subsampled data (in blue). The last, although very easy to compute, will be much more variable than the recommended approach because it is based on a tiny fraction of the data.

plot(times[j], x[j], col="#00000040", xlab="x", ylab="y")
a <- times[j]; b <- y[j]   # Subsampled data
lines(a, b, col="Gray")
f <- 1/6                   # Strength of the lowess smooths
lines(lowess(a, f=f)$y, lowess(b, f=f)$y, col="Red", lwd=2)
lines(lowess(times[j], f=f)$y, lowess(x[j], f=f)$y, col="Blue")

The red line (lowess smooth of the subsampled windowed mean) is a very accurate representation of the function used to generate the data. The blue line (lowess smooth of the subsampled data) exhibits spurious variability.

Statistical Power – How to Draw the Estimated Power Curve of a Test

Note that if you're finding power across both effect sizes and across sample sizes you'd have a power surface rather than a power curve.

We've also got estimates of rejection rates from simulation, so rather than interpolation there will be some smoothing of noisy estimates.

The rejection counts at any combination of effect size and sample size will be binomial so we could fit GLMs or GAMs to do the smoothing (though often an adequate fit can be obtained via weighted least squares for example).

For most statistics in common use, as sample size gets large $\sqrt{n}\cdot \delta$ (where $\delta$ is effect size for some power) tends to be nearly constant, which simplifies the task of smoothing in the sample-size direction.

For one-tailed tests, in a number of common cases the normal quantiles of the power is nearly linear in effect size; this suggests fitting a probit link, possibly with natural cubic splines or a locally linear smooth. (In some other situations logistic quantiles might be better approximated by a line, so local linearity in a logit model would be a good choice in such situations.)

For two tailed tests you'll sometimes tend to have near linearity away from the null but it may be nearly quadratic close to effect size 0.

If you have the time to run a lot of simulations at each of a large number of different $n$ and $\delta$ it's less important how you smooth -- sometimes you can just use sample proportions and linear interpolation -- in at least some situations this may be enough.

Best Answer

Related Solutions

R – Scatterplot Smoothing with Large Dataset: Exploring Different Methods

Example

Statistical Power – How to Draw the Estimated Power Curve of a Test

Related Question