MATLAB: Curve fitting error:nlinfit rank deficient

nlinfit

this code is for T1 fitting for phantom

tr=[32 64 128 256 512 1020 2050];

fun1=@(p,x) p(1).*(1-exp(-x./p(2)))+p(3);

si1=[246.672672672673 395.729729729730 605.585585585586 890.261261261261 1221.54654654655 1529.04204204204 1719.57357357357];

p0= [0,si1(end),50];

options = statset('Funvalcheck','off');

beta0 = p0;

[beta1,R,J,CovB,MSE] = nlinfit(tr,si1,fun1,beta0);%nonlinear fitting

and the answer is Rank deficient, rank = 2, tol = 5.904049e-15.

what's wrong?

Best Answer

Whats wrong? Mainly really poor starting values. Not enough data is another good reason, and a model that is not easily fit on data that varies over multiple orders of magnitude.

Other problems? A better algorithm is often a good choice. You might also consider using a weighted nonlinear regression.

Here is my fit, using a better algorithm for this class of problem.

funlist = {@(c,xdata) 1 - exp(-xdata/c), 1};
[INLP,ILP] = fminspleas(funlist,1719,tr,si1)
INLP =
       427.15
ILP =
       1528.4
        175.6
plot(tr,si1,'o')
trev = 32:2050;
hold on
plot(trev,ILP(1)*(1-exp(-trev/INLP)) + ILP(2),'r-')
grid on

The fit here seems reasonable. Still not nearly sufficient data to do the fit well.

I used my own fminspleas (it lives on the file exchange, for user download.) It is better for this kind of problem because it is more robust to poor starting values.

So what is wrong here? Why did nlinfit fail? Look at the initial parameters you chose,

[0, 1719, 50] % start point

[1528.4, 427.8, 175.6] % estimated values

So when I used fminspleas, it has the ability to think about this as a ONE variable nonlinear optimization. p(1) and p(3) are intrinsically linear parameters, that can be estimated conditionally on the value of p(2). So fminspleas is able to succeed even when nlinfit gave up the ghost. But see how far away it had to move for the nonlinear parameter from your starting values. nlinfit has it much harder, since you gave it crap to start with in the estimation.

Could you have found more intelligent starting values? Probably so. Given this model...

fun1=@(p,x) p(1).*(1-exp(-x./p(2)))+p(3);

we see that at x == 0, fun1 will return exactly p(3).

As a lmit, as x--> inf, the function will approach p(1) + p(3).

If we look at the plot of the data, I might have guessed p(3) should have been roughly 100, or maybe even a little higher.. You tried 50. Regardless, that was not a terrible choice for p(3).

But if I had to guess a value for the asymptote as x approaches inf. I might have guessed roughly 1800, just from the plot.

So if we could have predicted

p(3) ~ 100

p(1) + p(3) ~ 1800

then your starting values were clearly way off. p(1) should be something like 1700, just looking at the data.

How about the starting value for p(2)? Lets see. An exponential rate parameter is easy to estimate. What is the point where a negative exponential has died off by say 90%? That is, can we find the value where

exp(-u) = 0.1

Easy.

-log(.1)
ans =
       2.3026

So, if I had to guess a start point for p(2), since the exponential has died by about 90% at x==1500, then we can find an very rough initial value for p(2) as

p2init = -1500/log(0.1)
p2init =
       651.44

Lets try nlinfit now.

fun1=@(p,x) p(1).*(1-exp(-x./p(2)))+p(3);
p0= [1700,-1500/log(0.1),100];
options = statset('Funvalcheck','off');
beta0 = p0;
[beta1,R,J,CovB,MSE] = nlinfit(tr,si1,fun1,beta0);%nonlinear fitting
beta1
beta1 =
       1528.4       427.15        175.6

Amazing. It worked, and did not even complain. ;-)

Better (more) data would be useful. And the model does not seem to fit terribly well, with not really the correct shape. But it is way better than the sharp stick in the eye that you got when you tried it. The biggest lessons you need to take from this are to use a better algorithm to fit exponential models, or if you are going to use an algorithm that cannot handle poor starting vlaues, then choose more intelligent starting values. Otherwise, expect to see crap out the end far too often.

What could you do to improve this mess? For example, if I were your own consultant, I would tell you to get better data first. The process has not yet gone far enough to know where it will level off. For example, let me use a constrained regression spline model to fit your data really well. I'll let it tell me where the data seeem to want to level off,

slm = slmengine(tr,si1,'knots',[0 50 100 200 400 800 1600 3200 5000],'increasing','on','concavedown','on','leftminvalue',0,'plot','on');

I used my SLM toolbox for the fit, providing only information that seemed logical from what I can see. That is, we have a curve that is a monotone increasing function, always non-negative, but always a negatively curved function.

As you can see, the spline model fits quite well compared to the negaive exponential. But it does not seem to know where to level the curve off on the top end. To me, that tells me you need a bit more data on the top end. A little more data down near zero would help too, in order to better resolve the behavior at both ends, as well as to pin down the rate parameter more accurately. It is the data near each end of the curve that provides the most information about p(1) and p(3) in your model, and the data in the middle that are telling the estimation the value of the rate parameter.

Is there a better choice of model? Perhaps. But I don't really know if you had a good reason to choose that model, or if you just thought it might work well. It is ok, but has too much lack of fit for my eyes. One option might be...

ft = fittype('a + b/(1+ exp(-x/c))')
ft = 
     General model:
     ft(a,b,c,x) = a + b/(1+ exp(-x/c))
mdl = fit(tr',si1',ft,'startpoint',[-1000 2900 600])
mdl = 
     General model:
     mdl(x) = a + b/(1+ exp(-x/c))
     Coefficients (with 95% confidence bounds):

       a =       -1204  (-1513, -893.7)
       b =        2871  (2477, 3264)
       c =       284.1  (187.6, 380.6)
plot(tr,si1,'o'),hold on
plot(mdl)

Sigh. Not really any better in terms of fitting the roll over. Well, that was only one of several classic forms you might have tried. Here is another:

ft = fittype('a + b*erf(x/c)')
ft = 
     General model:
     ft(a,b,c,x) = a + b*erf(x/c)
mdl = fit(tr',si1',ft,'startpoint',[0 1800 1000])
mdl = 
     General model:
     mdl(x) = a + b*erf(x/c)
     Coefficients (with 95% confidence bounds):
       a =       244.2  (73.16, 415.3)
       b =        1405  (1177, 1633)
       c =       664.1  (415.2, 913)
plot(mdl)
hold on
plot(tr,si1,'o')

Even more so, just too much lack of fit. What it tells me is your data is not rolling over sufficiently well to pin down the shape of that curve.

Still, I'd just use a spline. AND get better/more data where it counts.

Related Solutions

MATLAB: Set sum of coefficients to equal 1 in biexponential fit

You are using the exp2 model form

ft = fittype('exp2')
ft = 
     General model Exp2:
     ft(a,b,c,d,x) = a*exp(b*x) + c*exp(d*x)

except that you wish to have the addirional constraint that a+c == 1.

Trivial. Just build the constraint into the model.

ft = fittype('a*exp(b*x) + (1-a)*exp(d*x)','indep','x')
ft = 
     General model:
     ft(a,b,d,x) = a*exp(b*x) + (1-a)*exp(d*x)

MATLAB: What is the best fit for a set of data

I think you need to spend some time learning about modeling. Do some reading. Lots of stuff online to be found. You would want to consider at some point if a regression style model is really appropriate, or if a simple spline interpolation would be satisfactory. Do you just want a pretty plot? Do you want something you can interpolate with, just to predict viscosity for different temperatures? So you should spend some time deciding what you want, what you need, what are your long term goals in the task at hand.

Anyway, alot of modeling ends up being common sense. So first, plot your data. Always plot everything. A simple rule that will serve you greatly in the future.

plot(T,V,'.'), grid on

Here, I would first recognize that your data varies by several orders of magnitude in Viscosity, but temperature is pretty well-behaved. I would ask questions like is this measured data? What do you expect about the noise structure in your data? Note that any traditional fiit using least squares in the residuals of viscosity will be STRONGLY influenced by only those first few points. Again, this is something you need to learn about modeling. But my point is that IF we look at the data,

[T,V]
ans =
          273        0.143
          283       0.0738
          293       0.0416
          303       0.0252
          313       0.0163
          323       0.0111
          333      0.00793
          343      0.00589
          353      0.00452
          363      0.00356
          373      0.00288
          383      0.00238
          393        0.002
          403      0.00171
          413      0.00148
          423      0.00129
          433      0.00114
          443      0.00102
          453     0.000913
          463     0.000825
          473     0.000749
          483     0.000683
          493     0.000625
          503     0.000574
          513     0.000528
          523     0.000488
          533     0.000451
          543     0.000418
          553     0.000387
          563      0.00036
          573     0.000334
          583     0.000311
          593     0.000289

I would want my curve to fit well down in that tail. Essentially, ypu want to consider if any proportional (relative) errors are appropriate. Most of the time, when your data varies by many orders of magnitude, that is what you will want to see. If so, then it makes complete sense to log the y variable, thus here we would work in terms of log(viscosity). Plot that. Look at it. Think about the curve you see.

semilogy(T,V,'.')
grid on

So here we see a relationship that is far more well-behaved. I don't see much noise in the data. So a simple interpolation would do well here. In fact, you need to consider if it may well be the best choice, since any simple model will have considerable lack of fit! So again, what do you need out of this in the end? A pretty picture? A function that you can write down, even if it misses passing through the data very well? What do you need?

Next, IF you would consider a simple model like

viscosity = A*temp^B

then a log-log plot would give a straight line fit. (Note that I ignored for the moment the constant offset term C that you posed.)

loglog(T,V,'.'), grid on

So it gave us something closer to a straight line. In fact, a little play would convince you that no value of C will make this curve a straight line. That is, there is no value of C such that

loglog(T,V - C ,'.')

would yield something close to a straight line. This is something you can prove rather easily. (Think about it.)

So the model you posed, of

viscosity = A*temp^B + C

is simply not a good choice.(You actually had it where you subtracted C, but that is simply a sign chance on the value of C: so completely irrelevant.)

Instead, try this plot, and think about what it means:

loglog(T - 250,V,'.'), grid on

I just picked a number out of thin air, assuming these were temperatures in degrees Kelvin.

Note how the relationship is now very near a straight line. That suggests a good choice of model might be something like

viscosity = A*(temp - t0)^B

where t0 is on the order of 250, so around -20 degrees C.

So we might decide to fit a model of that form instead. Again, you very much want to do that fit on the log of viscosity. If we log that model, we will have:

log(viscosity) = log(A) + log(temp - t0)*B

I'll assume the curve fitting toolbox here, and with some not unreasonable guesses for the parameters, we see this:

ft = fittype('log(A) + log(temp - t0)*B','independent','temp','coefficients',{'A','B','t0'})
     General model:
     ft(A,B,t0,temp) = log(A) + log(temp - t0)*B
     
mdl = fit(T,log(V),ft,'start',[100,-1,250])
mdl = 
     General model:
     mdl(temp) = log(A) + log(temp - t0)*B
     Coefficients (with 95% confidence bounds):
       A =       234.9  (123.2, 346.5)
       B =      -2.339  (-2.423, -2.255)
       t0 =       250.6  (247.8, 253.5)
       
plot(mdl)
hold on
plot(T,log(V),'o')
grid on
xlabel 'Temp, degrees K'
ylabel 'log(viscosity)'

So not terrible. Stil some lack of fit, just a wee bit. A spline interpolant will nail the shape of the curve exactly.

But only you know why you need to do this curve fit. Does a model offer you some understanding of the process? With some thought, you could probably do a little better job yet. What you should understand here is the model I derived came from simply looking at the data, then applying a little common sense to the modeling. And plot everything.

Best Answer

Related Solutions

MATLAB: Set sum of coefficients to equal 1 in biexponential fit

MATLAB: What is the best fit for a set of data

Related Question