MATLAB: How to do extrapolation of a curve

extrapolation. prediction.

Hi, I'm new to Matlab, never used it before. I have a problem. How can I extrapolate a curve in Matlab to predict values?

I have these X-values:

X (time) 10 20 30 40 50 60 70 80 90 100 110 120

I have these Y-values:

Y (cumulative mass) 18,57 40,10 81,15 92,96 99,44 104,59 108,71 113,16 118,23 122,60 126,63 130,49

I would like to extrapolate this curve to see when cumulative mass stops growing. Is this possible to achieve in Matlab?

Best Answer

If you really want to find where this curve levels off, you need to get better data, or get a better understanding of the process that this data came from, because it shows no sign at all of leveling off.

In general, as Star points out, you need some physical model that explains the process. Extrapolation is a risky business with or without a model. For example, consider models of population, ocean temperatures, global temperatures, etc. As much as people try, you need a good model of your process, one that explains it well. Otherwise, you will get random predictions, or at best, completely arbitrary ones. Often those predictions can be heavily biased by what those who will model the process want to see.

Why did I bring up that point? Because the curve that you "want" to see is inconsistent with the data you have. Note that the data has a nearly constant slope over the range [50,120]. This is completely inconsistent with your expectation that the curve will level out. It also makes it nearly impossible to predict where or when that curve will roll over. (Using a negative exponential as Star suggests is a bad idea, because it essentially builds in a curve shape into the curve that will force it to roll over just a little above the end of your data. That model also lacks the behavior you expect in the early part of the curve.)

So I'll use a tool that will let me fit your data, as well as plot it, and compute a derivative estimate, as well as allow me to extrapolate, all without actually making too many strong assumptions on the shape of the curve. That tool is my SLM toolbox.

So first, I'll use the tool as essentially an interpolating spline.

slm = slmengine(X,Y,'increas','on','plot','on','knots',X);

As you see, it fits the curve, but offers no predictive value. What does the first derivative of that curve look like?

Between 50 and 120, it is as close to flat as I could imagine. Perhaps I might try to convince myself that the derivative is slightly decreasing, which might allow us to infer some point where it might roll over. So in the next plot, I've shown the second derivative of the spline function, along with horizontal reference lines so you can see that it is indeed as straight as it looked. The second derivative plot shows not even any remote indication the curve is rolling over. If it was going to roll over, the second derivative would be negative at the top end.

Instead, you can see it is DEAD SOLID ZERO up there.

So if you choose to use any nonlinear exponential model that predicts this curve is rolling over, you are simply going to predict a shape based on the model you choose. You will indeed force the curve to roll over, but any prediction of a top end is completely bogus, at least anything based on some arbitrary exponential model.

A nice thing about SLM is we can use it to extrapolate intelligently. Or, at least semi-intelligently. Suppose we decide that if this curve is going to roll over, it should be flat by at least X=250. That is twice as far as your data goes. So I'll add an extra knot at X=250, and force the curve to be flat at that point.

slm = slmengine(X,Y,'increas','on','plot','on','knots',[X,250],'rightslope',0);

As you can see, here it predicts the curve tops out at roughly 150. The problem is, suppose I had pushed the end point to X=1000? I mean, if it is close enough to be constant by 250, it should still be constant at 1000.

slm = slmengine(X,Y,'increas','on','plot','on','knots',[X,250 1000],'rightslope',0);

As you can see in that plot, the curve leveled off just under 300 for Y. So which one should I believe? The answer is neither prediction has any base in reality.

I have a favorite quote by Mark Twain, on the dangers of extrapolation.

“In the space of one hundred and seventy six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over a mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitic Silurian Period, just a million years ago next November, the Lower Mississippi was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-pole. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo [Illinois] and New Orleans will have joined their streets together and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.”

Mark Twain, "Life on the Mississippi", 1884

So, be careful. If you use an arbitrary exponential model to fit this curve, you will get random nonsense. A more realistic model might be a sigmoid shape of some sort, but there are many such curves, all of which have subtly different shapes. Even so, don't expect that curve to have any predictive value, since the curve fit tool will not find any way to know where the curve will roll over.

I'll suggest that you need to revisit the process that generated the data. Is it really expected to roll over? Don't kid yourself.

Are you sure there is not some asymptotic behavior that approaches a straight line? That I could easily believe. For example, suppose the model is a more believable one, at least believable in context of this data? Suppose the model was of the form...

f(X) = a + b*X + c./(1+exp(-(X-X0)/d)

Here, I've built a model that will be asymptotic to a straight line, by using an underlying sigmoidal shape, and adding a term that represents a line. We can fit that model easily enough using my fminspleas, also found on the file exchange.

mdl = {1,@(c,X) X,@(c,X) 1./(1+exp(-(X-c(1))./c(2)))};
[X0_d,abc] = fminspleas(mdl,[40,10],X,Y)
X0_d =
       23.322       3.7777
abc =
       12.141
      0.45509
     64.544

See that the slope of the linear asymptote is 0.45509, and the inflection point on the curve should be at roughly 23.322.

xhat = 10:200;
yhat = abc(1) + abc(2)*xhat + abc(3)./(1+exp(-(xhat - X0_d(1))/X0_d(2)));
plot(X,Y,'ro',xhat,yhat,'b-')
grid on

Now that curve I can believe. As you can see, not only does it nicely fit the bottom end, as well as fit that linear asymptote. All you need to do now is think about why the data came out like it did.

Related Solutions

MATLAB: Find tangent to set of 2D points

Your curve is somewhat noisy. I might think to point out that my interparc tool from the file exchange can be made to give estimates of the derivatives of the curve at any point along the curve. So it would return two numbers at a point that are essentially dx/dt and dy/dt. The ratio is the slope of the tangent line at that point.

The above works even for non-functional closed curves as you have, since interparc works in a parametric form. It computes a simple cumulative linear chordal arclength along the curve, then uses spline models x(t), y(t).

The problem is, your data is somewhat noisy, based on the curve we are shown. So that tangent line would be pointing all over the place.

Anyway, if you were to use interparc in that way, then it would be just as simple to do the work yourself. You do not provide your data. But, suppose you have data that is in the form of two vectors x and y, sorted in order along such a path in the (x,y) plane?

t = cumsum([0;diff(x(:)).^2 + diff(y(:)).^2)]);

So, t is the cumulative piecewise linear distance between each point, one to the next along your curve.

splx = spline(t,x);
sply = spline(t,y);

Differentiate each spline, then evaluate the ratio of the derivatives at the original points. That is the slope of the tangent line, at that point. If you have the curve fitting toolbox, you can use fnder to differentiate the splines.

tanslope = ppval(fnder(sply),t)./ppval(fnder(splx),t);

If no curvefitting toolbox, the derivative function is trivial to generate for the pp form that spline returns.

Again, the issue is that you may wish to produce a smoother estimate of the tangent slope. If you provide your data, and you need such a smoothed tangent line, just attach the data to a .mat file as a comment.

MATLAB: Extrapolation!!

Shiver. You have data that lies between 14.5 and 15.9, and you wish to extrapolate down to x = 9???????????? Have you learned too little from the works of Mark Twain?

Ok. if you insist on doing this extrapolation down to an arbitrarily virtually random value at x = 9, then use a tool that can do it in style, and will have the properties you desire for that curve. My SLM tools will do exactly what you need.

slm = slmengine(x,y,'plot','on','knots', ...
   [9 14.5 15 15.5 16],'increasing','on','concaveup','on');

Find SLM on the file exchange:

http://www.mathworks.com/matlabcentral/fileexchange/24443-slm-shape-language-modeling

Best Answer

Related Solutions

MATLAB: Find tangent to set of 2D points

MATLAB: Extrapolation!!

Related Question