MATLAB: Weighted fit with lsqcurvefit and (ideally) multistart

fittingGlobal Optimization ToolboxlsqcurvefitMATLABmodellingmultistartodeweighting

Hi,

I'm working on a problem that involves fitting experimental data to a model comprising a system of (connected) ODEs, in order to extrapolate optimum parameters (k) that give the best fit. For general background, the data refers to levels of metabolites in cells, and the model is a simple one of the metabolic pathways that connect them. (Can't say too much more for now!).

I've been using lsqcurvefit with multistart to fit this. It works pretty decently (let me know if not, it'll be a silly copying error!). However, I was wondering if it is possible to apply a weighting function to the fitting (either using this fitting method or otherwise)? Rationale – the absolute values of each c(n) have very different absolute magnitudes and error magnitudes, simply as they had to be multiplied by different constant factors to process them. Ideally, the weights would be roughly:

– c(1), c(2), c(6): 1

– c(3), c(7): 40

– all others: 4000

Without weighting, the fit for the larger c(1), c(2), c(3), c(6) and c(7) is great – but for the smaller other ones it is less good. (Or, rather, the fit is OK, but I get wildly different k-values each time for steps involving the smaller c(n)).

Any advice would be much appreciated. I am something of a novice – basically a biologist teaching myself matlab! – so please excuse any ignorance. Also happy to hear alternative approaches of course. I've copied the raw data to fit, the objective function, and my multistart fitting below in that order. I have them as separate files, so I haven't attempted to combined them into one…

Thanks a lot,

Matt

%experimental data to fit%
time_full = [0;1;2;3;4;5;6;7;8;9;10;24;48;96;144;192;240;288;336;384;432];
label_full = [95.898177	0	0	0	0	4	0.1	0.001055	0.000768
95.898177	0.116654171	0.000169089	0	0	3.883345829	0.099830911	0.001055	0.000768
95.898177	0.221311714	0.000215784	0	0	3.778688286	0.099784216	0.001055	0.000768
95.898177	0.361815956	0.001337332	0	0	3.638184044	0.098662668	0.001055	0.000768
95.898177	0.443043182	0.001897635	0	0	3.556956818	0.098102365	0.001055	0.000768
95.898177	0.511783075	0.002904908	2.73612E-06	0	3.488216925	0.097095092	0.001052264	0.000768
95.898177	0.596086415	0.003847949	2.19614E-05	0	3.403913585	0.096152051	0.001033039	0.000768
95.898177	0.70422927	0.004991988	3.41544E-05	0	3.29577073	0.095008012	0.001020846	0.000768
95.898177	0.736165548	0.005402772	4.34905E-05	0	3.263834452	0.094597228	0.00101151	0.000768
95.898177	0.863634039	0.006882534	4.92657E-05	0	3.136365961	0.093117466	0.001005734	0.000768
95.898177	0.961691531	0.007922809	8.34772E-05	0	3.038308469	0.092077191	0.000971523	0.000768
95.898177	1.694849679	0.02488041	0.000242074	0.000298399	2.305150321	0.07511959	0.000812926	0.000469601
95.898177	2.156329216	0.046290117	0.00048092	0.00018762	1.843670784	0.053709883	0.00057408	0.00058038
95.898177	2.28066375	0.058965709	0.000558424	0.000402402	1.71933625	0.041034291	0.000496576	0.000365598
95.898177	2.263872217	0.060281286	0.000637646	0.000436729	1.736127783	0.039718714	0.000417354	0.000331271
95.898177	2.280749095	0.059985812	0.000590469	0.000523726	1.719250905	0.040014188	0.000464531	0.000244274
95.898177	2.250775532	0.059775064	0.000604293	0.00049205	1.749224468	0.040224936	0.000450707	0.00027595
95.898177	2.248860841	0.059972783	0.000622058	0.000422935	1.751139159	0.040027217	0.000432942	0.000345065
95.898177	2.28310359	0.059096809	0.000547755	0.000441093	1.71689641	0.040903191	0.000507245	0.000326907
95.898177	2.324286746	0.058745232	0.000581056	0.000433505	1.675713254	0.041254768	0.000473944	0.000334495
95.898177	2.298780141	0.059541016	0.000586692	0.000465395	1.701219859	0.040458984	0.000468308	0.000302605]

%objective function that models system of interest. k-values 1 through 7 to be determined through fitting values for each c(n) to this model%
function C=kinetics_full(k,t)
        c0=[95.898177 0 0 0 0 4 0.1 0.001055 0.000768];
        [~,C]=ode45(@(t,c) DifEq(t,c,k),t,c0);
        
        
        
        function dC=DifEq(t,c,k)
    dcdt=zeros(9,1);
    dcdt(1)=-2*k(1).*c(1)+k(7).*(c(2)+c(3)+c(4)+c(5)+c(6)+c(7)+c(8)+c(9))+k(5).*(c(4)+c(8))+k(6).*(c(5)+c(9)); 
    dcdt(2)=1.1*k(1).*c(1)-k(2).*c(2)-k(7).*c(2); 
    dcdt(3)=k(2).*c(2)-k(3).*c(3)-k(7).*c(3); 
    dcdt(4)=k(3).*c(3)-k(4).*c(4)-k(5).*c(4)-k(7).*c(4);
    dcdt(5)=k(4).*c(4)-k(6).*c(5)-k(7).*c(5);
    dcdt(6)=0.9*k(1).*c(1)-k(2).*c(6)-k(7).*c(6); 
    dcdt(7)=k(2).*c(6)-k(3).*c(7)-k(7).*c(7);
    dcdt(8)=k(3).*c(7)-k(4).*c(8)-k(5).*c(8)-k(7).*c(8);
    dcdt(9)=k(4).*c(8)-k(6).*c(9)-k(7).*c(9);
    dcdt(1)=0;
       
            
            dC=dcdt;
           
            
        end
        
    end

%multistart fitting protocol%
problem = createOptimProblem ('lsqcurvefit','objective', @kinetics_full, 'x0', randn(7,1), ...
'lb', [0,0,0,0,0,0,0.03], 'ub', [10,10,10,10,10,10,0.07],'xdata',time_full,'ydata',label_full);
ms = MultiStart('PlotFcns',@gsplotbestf);
[xmulti,errormulti] = run(ms,problem,20);

Best Answer

I doubt that weighting is appropriate, and I would not recommend doing that.

I recognise that code excerpt, so I am attaching an updated version of that file using the ga (genetic algorithm) function to do the parameter estimation. It uses random initial values for the population, so I definitely advise running it several times, saving the parameter vector (‘theta’) and ending fitness value (‘fval’) for each run to determine the best fit among several runs. Likely the easiest way to do that would be to add an output to the function:

function [theta,fval] = Igor_Moura_GA

so you can then call it in a for loop, and write the ‘theta’ and ‘fval’ values to a file so you do not lose them if something should go wrong.

Also, it may be best to include the initial conditions in the parameter vector, rather than assuming that the hard-coded initial conditions are the best. Append the initial conditions parameters onto the end of the parameter vector, so in that original code, with 6 parameters and 4 initial conditions, the new parameter vector would have a length of 10 and the ‘kinetics’ function would then be written:

function C = kinetics(theta,t)
c0 = theta(7:10);
[T,Cv] = ode45(@DifEq,t,c0);
    function dC=DifEq(t,c)
    dcdt=zeros(4,1);
    dcdt(1)=-theta(1).*c(1)-theta(2).*c(1);
    dcdt(2)= theta(1).*c(1)+theta(4).*c(3)-theta(3).*c(2)-theta(5).*c(2);
    dcdt(3)= theta(2).*c(1)+theta(3).*c(2)-theta(4).*c(3)+theta(6).*c(4);
    dcdt(4)= theta(5).*c(2)-theta(6).*c(4);
    dC=dcdt;
    end
C=Cv;
end

Be sure to change the ‘Parms’ (parameter number) value to reflect the augmented parameter vector length. My code will make the appropriate adjustments.

Related Solutions

MATLAB: Weighted fit of ode’s to data

Unless you have a good reason to weightcertain varriables, (for example you are using values from the literature that are given in means and stardard deviations or standard errors, so you would use inverse-variance weighting, for example), I would not weight them.

This uses your variables without any weighting. I used ga as the optimisation function with your data, and added the initial conditions vector ‘x0’ as the last five elements of the parameter vector. It converged in 323 generations and estimated these parameters (where the parameters are the ‘theta’ vector) in 04:56:27.933. My delay in responding is that this took nearly 5 hours to converge (such as it is).

The estimated parameters:

	Rate Constants:
		Theta(1) =  0.76200
		Theta(2) = 21.81800
		Theta(3) = 13.67813
		Theta(4) = 25.60900
		Theta(5) = 36.61325
		Theta(6) = 15.24400
		Theta(7) = 31.94681

The estimated initial conditions:

		Theta(8) = 96.02097
		Theta(9) =  0.01250
		Theta(10) =  0.00425
		Theta(11) =  0.04367
		Theta(12) =  0.00500

The fitness value (norm of the residuals) is 3.8365.

It fit two of the variables relatively well, however it did not fit three of them at all well:

Weighted fit of ode's to data - 2019 07 12.png

The essential ga call I used is:

ftns = @(theta) norm(c-kinetics(theta,t));
PopSz = 500;
Parms = 12;
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',2E3, 'PlotFcn',@gaplotbestf, 'PlotInterval',1);
[theta,fval,exitflag,output] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts)

I always estimate the initial conditions along with the other parameters, so the only change I made to your ‘kinetics’ ODE function was to replace the ‘c0’ you coded with:

c0 = k(8:12);

I do not know what system you are modeling, so I cannot check if you coded your system of differential equations correctly. (I would expect a much better optimisation result.)

Please check your ‘kinetics’ system. If it is coded correctly, this result is the best we can likely hope for, so I will not re-run this. If it needs some ‘tweaking’, I would be interested in doing this again with the ‘tweaked’ system.

MATLAB: Parameter estimation in ODE

You are trying to fit 5 columns of your differential equation to 4 columns of data. That will throw the error you got.

Change the ‘C’ assignment in the ‘kinetics’ objective function to:

C=Cv(:,1:4);

and it works.

So ‘klinetics’ is now:

function C=kinetics(theta,t)
c0=[1;0;0;0;0];
[T,Cv]=ode45(@DifEq,t,c0);
%

    function dC=DifEq(t,c)
    dcdt=zeros(5,1);
    dcdt(1)=-theta(1).*c(1)-theta(2).*c(1);
    dcdt(2)= theta(1).*c(1)+theta(4).*c(3)-theta(3).*c(2)-theta(5).*c(2);
    dcdt(3)= theta(2).*c(1)+theta(3).*c(2)-theta(4).*c(3)+theta(6).*c(4);
    dcdt(4)= theta(5).*c(2)-theta(6).*c(4);
    dcdt(5)=theta(1).*c(2)-theta(5);
    dC=dcdt;
    end
C=Cv(:,1:4);
end

If you then want to estimate Compartment 5, run your differential equation function again with all the estimated parameters to include it and plot it.

I created a version of ‘kinetics’ called ‘kinetics5’ to calculate and plot Compartment 5 as well. That entire additional code (with all the previously estimated parameters, so all this goes after the lsqcurvefit call) is:

function C=kinetics5(theta,t)
c0=[1;0;0;0;0];
[T,Cv]=ode45(@DifEq,t,c0);
%
    function dC=DifEq(t,c)
    dcdt=zeros(5,1);
    dcdt(1)=-theta(1).*c(1)-theta(2).*c(1);
    dcdt(2)= theta(1).*c(1)+theta(4).*c(3)-theta(3).*c(2)-theta(5).*c(2);
    dcdt(3)= theta(2).*c(1)+theta(3).*c(2)-theta(4).*c(3)+theta(6).*c(4);
    dcdt(4)= theta(5).*c(2)-theta(6).*c(4);
    dcdt(5)=theta(1).*c(2)-theta(5);
    dC=dcdt;
    end
C=Cv;
end
for k1 = 1:length(theta)
    fprintf(1, '\t\tTheta(%d) = %8.5f\n', k1, theta(k1))
end
tv = linspace(min(t), max(t));
Cfit = kinetics5(theta, tv);
figure(1)
plot(t, c, 'p')
hold on
hlp = plot(tv, Cfit);
hold off
grid
xlabel('Time')
ylabel('Concentration')
legend(hlp, 'C_1(t)', 'C_2(t)', 'C_3(t)', 'C_4(t)', 'C_5(t)', 'Location','SW')

That should do what you want.

Best Answer

Related Solutions

MATLAB: Weighted fit of ode’s to data

MATLAB: Parameter estimation in ODE

Related Question