MATLAB: How to do extrapolation of a curve

extrapolation. prediction.

Hi, I'm new to Matlab, never used it before. I have a problem. How can I extrapolate a curve in Matlab to predict values?
I have these X-values:
X (time) 10 20 30 40 50 60 70 80 90 100 110 120
I have these Y-values:
Y (cumulative mass) 18,57 40,10 81,15 92,96 99,44 104,59 108,71 113,16 118,23 122,60 126,63 130,49
I would like to extrapolate this curve to see when cumulative mass stops growing. Is this possible to achieve in Matlab?

Best Answer

If you really want to find where this curve levels off, you need to get better data, or get a better understanding of the process that this data came from, because it shows no sign at all of leveling off.
In general, as Star points out, you need some physical model that explains the process. Extrapolation is a risky business with or without a model. For example, consider models of population, ocean temperatures, global temperatures, etc. As much as people try, you need a good model of your process, one that explains it well. Otherwise, you will get random predictions, or at best, completely arbitrary ones. Often those predictions can be heavily biased by what those who will model the process want to see.
Why did I bring up that point? Because the curve that you "want" to see is inconsistent with the data you have. Note that the data has a nearly constant slope over the range [50,120]. This is completely inconsistent with your expectation that the curve will level out. It also makes it nearly impossible to predict where or when that curve will roll over. (Using a negative exponential as Star suggests is a bad idea, because it essentially builds in a curve shape into the curve that will force it to roll over just a little above the end of your data. That model also lacks the behavior you expect in the early part of the curve.)
So I'll use a tool that will let me fit your data, as well as plot it, and compute a derivative estimate, as well as allow me to extrapolate, all without actually making too many strong assumptions on the shape of the curve. That tool is my SLM toolbox.
So first, I'll use the tool as essentially an interpolating spline.
slm = slmengine(X,Y,'increas','on','plot','on','knots',X);
As you see, it fits the curve, but offers no predictive value. What does the first derivative of that curve look like?
Between 50 and 120, it is as close to flat as I could imagine. Perhaps I might try to convince myself that the derivative is slightly decreasing, which might allow us to infer some point where it might roll over. So in the next plot, I've shown the second derivative of the spline function, along with horizontal reference lines so you can see that it is indeed as straight as it looked. The second derivative plot shows not even any remote indication the curve is rolling over. If it was going to roll over, the second derivative would be negative at the top end.
Instead, you can see it is DEAD SOLID ZERO up there.
So if you choose to use any nonlinear exponential model that predicts this curve is rolling over, you are simply going to predict a shape based on the model you choose. You will indeed force the curve to roll over, but any prediction of a top end is completely bogus, at least anything based on some arbitrary exponential model.
A nice thing about SLM is we can use it to extrapolate intelligently. Or, at least semi-intelligently. Suppose we decide that if this curve is going to roll over, it should be flat by at least X=250. That is twice as far as your data goes. So I'll add an extra knot at X=250, and force the curve to be flat at that point.
slm = slmengine(X,Y,'increas','on','plot','on','knots',[X,250],'rightslope',0);
As you can see, here it predicts the curve tops out at roughly 150. The problem is, suppose I had pushed the end point to X=1000? I mean, if it is close enough to be constant by 250, it should still be constant at 1000.
slm = slmengine(X,Y,'increas','on','plot','on','knots',[X,250 1000],'rightslope',0);
As you can see in that plot, the curve leveled off just under 300 for Y. So which one should I believe? The answer is neither prediction has any base in reality.
I have a favorite quote by Mark Twain, on the dangers of extrapolation.
“In the space of one hundred and seventy six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over a mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitic Silurian Period, just a million years ago next November, the Lower Mississippi was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-pole. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo [Illinois] and New Orleans will have joined their streets together and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.”
Mark Twain, "Life on the Mississippi", 1884
So, be careful. If you use an arbitrary exponential model to fit this curve, you will get random nonsense. A more realistic model might be a sigmoid shape of some sort, but there are many such curves, all of which have subtly different shapes. Even so, don't expect that curve to have any predictive value, since the curve fit tool will not find any way to know where the curve will roll over.
I'll suggest that you need to revisit the process that generated the data. Is it really expected to roll over? Don't kid yourself.
Are you sure there is not some asymptotic behavior that approaches a straight line? That I could easily believe. For example, suppose the model is a more believable one, at least believable in context of this data? Suppose the model was of the form...
f(X) = a + b*X + c./(1+exp(-(X-X0)/d)
Here, I've built a model that will be asymptotic to a straight line, by using an underlying sigmoidal shape, and adding a term that represents a line. We can fit that model easily enough using my fminspleas, also found on the file exchange.
mdl = {1,@(c,X) X,@(c,X) 1./(1+exp(-(X-c(1))./c(2)))};
[X0_d,abc] = fminspleas(mdl,[40,10],X,Y)
X0_d =
23.322 3.7777
abc =
12.141
0.45509
64.544
See that the slope of the linear asymptote is 0.45509, and the inflection point on the curve should be at roughly 23.322.
xhat = 10:200;
yhat = abc(1) + abc(2)*xhat + abc(3)./(1+exp(-(xhat - X0_d(1))/X0_d(2)));
plot(X,Y,'ro',xhat,yhat,'b-')
grid on
Now that curve I can believe. As you can see, not only does it nicely fit the bottom end, as well as fit that linear asymptote. All you need to do now is think about why the data came out like it did.