MATLAB: What is the best way to smooth and compute the derivatives of noisy data

curve fittingdata smoothingderivative calculatormaximum slope

Hello all,
I have a set of experimental data (temperature vs time), and have no idea of the function which satisfies the data behaviour. As a part of analysis, I need to estimate the first derivative using central differences and obtain the maximum gradient.
Because my data is too noisy, I need to filter it before taking the derivative. To keep the precision of data and minimize any distortion, I tried to remove the outliers from my data using a Savitzky–Golay filter. It shows great results, but my data is not quite smoothed as it can be seen in a picture of Savitzky–Golay filter. To compute the exact inflection point further curve fitting is required, as it can be seen in the first derivative picture.
Any suggestions or better approaches , would be highly appreciated as I’m new to Matlab and signal processing. I heard about the cubic spline for better smoothness, but I have no idea about it and not sure if it can solve my problem.
Many thanks in advance

Best Answer

The "best" way? That will be completely dependent on your data, on what you know about your data, and on what you are willing to assume about the process that generated the data.
Looking at your data, since I don't have your data, nor do I know anything about the data or the process, I can only guess. Thus, can I assume that what looks like noise really is noise? I had one group of clients who were not willing to make that assumption. Any smoothing at all was a bad thing, because there was potentially information in the high frequencty stuff that they needed to know about. Luckily for them, the noise was not overwhelming.
In general though, numerical differentiation is a noise amplifying process. So you need to kill off the noise to whatever extent possible. You can do so by use of a spline fit (IF you use the right spline model. An interpolating spline would be generally a bad idea for you), or you could use a regression model of some sort, IF you have a viable model for this process.
Again, I don't have your data. So, at best, I can only make some data up for an example. So, let me see, this should look at least a little like what you have shown:
x = linspace(0.5,2.5,100);
y = 19 + exp((x - 0.5)/2)/10 + randn(size(x))/100;
plot(x,y,'.')
untitled.jpg
The picture you show has a fair amount of data, yet it is not that complex of a relationship. As such, a low order polynomial model might be sufficient. Then you could just differentiate the polynomial. For example...
P3 = polyfit(x,y,3)
plot(x,y,'b.',x,polyval(P3,x),'r-')
untitled.jpg
P3d = polyder(P3)
plot(x,polyval(P3d,x),'-')
untitled.jpg
A problem with such an approach is it does not build information that you may have about the underlying functional relationship. That is, do you know that temperature will ALWAYS increase with time? If so, then the first derivative must always be positive, yet a polynomial fit gives you no such assurance. In fact, you might notice that the true underlying function behind my data has an everywhere increasing derivative, yet a cubic polynomial model had a negative second derivative. Worse, what if you use too high an order for a polynomial? That is easy to do.
P7 = polyfit(x,y,7);
plot(x,y,'b.',x,polyval(P7,x),'r-')
untitled.jpg
The same fact applies to a Savitsky-Golay filter. A Savitsky-Golay derivative estimation on noisy data will need to have a long window width, and low order implicit polynomial model. However, remember that Savitsky-Golay does not produce a differentiable or continuous function/derivative estimate. So, using my own movingslope code, that builds polynomial model on a moving window, then returns the derivative, we see:
plot(x,movingslope(y,20,1),'.')
untitled.jpg
Anyway, if we look at the bottom end of the curve, the 7th degree polynomial started to get a little squirrely. So, I might try to use a regression spline, constrained to have the expected shape. Lets see,
slm = slmengine(x,y,'plot','on','increasing','on','concaveup','on','jerk','positive');
So here, a cubic spline, constrained to have first, second, and third derivatives entirely positive over the domain.
untitled.jpg
Better yet, of course, is to use a nonlinear model, IF you know a good choice of model in advance.
But really, what matters is as I said in the beginning. What do you know? Because once you can provide information to the modeling process, you will GREATLY improve the derivative estimation process. Tools like Savitsky-Golay assume almost nothing about the system. A polynomial model assumes relatively little, but somewhat more.