To test SimBiology, I wrote a simple model of gene expression, whose analytical solutions are well known (Friedman, Phys. Rev. Lett. 97.16 (2006): 168302):
O -> O + M ; kmM -> M + P ; kpM -> null ; kdmP -> null ; kdp
If we run the stochastic simulation, e.g. using the 'ssa' solver, then the histogram of P should be a gamma distribution g(P; a, b) where a=km/kdp and b=kp/kdm.
Other software for stochastic simulation give correct results. But SimBiology produces results that are different than expected. The histogram of P is not g(P; a, b) but it seems to be g(P; a+1, b) for any choice of reaction rate constants.
(Fig. 1)
Parameters in the figure are the following:
km_ = 0.5;kp_ = 50.;kdm_ = 7.;kdp_ = 0.2;
(For comparison, I used the two simulation programs: StochPy and Mesokin ( http://pepe.ichf.edu.pl/tabaka/Witryna/Marcin_Tabaka.html ) )
Is this a bug or did I define something incorrectly in my script?
Below is the fragment of my code:
%%Create a model of an unregulated gene
model = sbiomodel('Unregulated gene');model.set('Name', 'Unregulated gene');%%Species names
speciesNames = {'O', 'M', 'P'};%%Reactions:
Transcription = addreaction(model, 'O -> O + M');Translation = addreaction(model, 'M -> M + P');mRNA_degradation = addreaction(model, 'M -> null');Protein_degradation = addreaction(model, 'P -> null'); %%Set Reactions to be MassAction
kl_Transcription = addkineticlaw(Transcription, 'MassAction');kl_Translation = addkineticlaw(Translation, 'MassAction');kl_mRNA_degradation = addkineticlaw(mRNA_degradation, 'MassAction');kl_Protein_degradation = addkineticlaw(Protein_degradation, 'MassAction'); %%Add Rate Constants for Each Reaction
p_km = addparameter(kl_Transcription, 'km', 'Value', km_);p_kp = addparameter(kl_Translation, 'kp', 'Value', kp_);p_kdm = addparameter(kl_mRNA_degradation, 'kdm', 'Value', kdm_); p_kdp = addparameter(kl_Protein_degradation, 'kdp', 'Value', kdp_); %%Set the Kinetic Law Constants for Each Kinetic Law.
kl_Transcription.ParameterVariableNames = {'km'};kl_Translation.ParameterVariableNames = {'kp'};kl_mRNA_degradation.ParameterVariableNames = {'kdm'};kl_Protein_degradation.ParameterVariableNames = {'kdp'}; %%Specify Initial Amounts of Each Species
O = 1; % Species names
M = 2;P = 3;model.species(O).InitialAmount = 1; % O : Species amounts
model.species(M).InitialAmount = 0; % M
model.species(P).InitialAmount = 0; % P
%%Simulation time
Tfinal = tfinal; % from command line
%%Get the Active Configuration Set for the Model.
cs = getconfigset(model,'active');%%Simulate Model Using SSA Stochastic Solver : Single trajectory
tfinal_ssa = Tfinal;tlog = 1;cs.SolverType =method; %'expltau'; %'ssa';
cs.StopTime = tfinal_ssa;solver = cs.SolverOptions;solver.LogDecimation = tlog;cs.CompileOptions.DimensionalAnalysis = false;simdata_ssa = sbiosimulate(model);[t_ssa1, x_ssa1] = selectbyname(simdata_ssa, speciesNames);
EDIT: One more important information: These histograms were collected from the single simulation trajectory x_ssa1, as the process is ergodic, so this should be equivalent to histogramming the end points of many shorter trajectories.
But perhaps a single trajectory is here recorded in such a way that the histogram is biased?
——————————————————
EDIT 2: The problem occurs when histograms are built from a single trajectory (simply, all data points from that trajectory are histogrammed). The problem does not occur in the case of histograms over an ensemble of trajectories (many trajectories are run, only end points of these trajectories are collected and histograms are built from these end points).
This is rather confusing because in reality the system is ergodic and histograms along a single (sufficiently long) trajectory should be same as ensemble histograms.
As I have shown in Fig. 1, other simulators (Mesokin, StochPy) generate correct histograms along a single trajectory.
(Fig. 2)
I added the code for ensemble simulation:
%%Simulate Model Using SSA Stochastic Solver : Ensemble
numRuns = numRuns_ens; % as input parameter
tfinal_ssa_ens = Tfinal_ens; % as input parametertlog_ens = 1; % any value different than 1 gives wrong histograms
% probably because some data are omitted
cs.SolverType = method; %'expltau'; %'ssa';cs.StopTime = tfinal_ssa_ens;solver = cs.SolverOptions;solver.LogDecimation = tlog_ens;cs.CompileOptions.DimensionalAnalysis = false;simdata_ssa_ens = sbioensemblerun(model, numRuns);% Collect last values of 'P' from each trajectory,
% i.e., those at time Tfinal_ens:
for i = 1:numRuns ends(i) = simdata_ssa_ens(i).Data(end,end); % last row, last column
end
I run it with the parameters:
Tfinal = 100000;Tfinal_ens = 50;numRuns_ens = 20000;km_ = 0.5;kp_ = 50.;kdm_ = 7.;kdp_= 0.2;
Here is how I plot histograms:
%%Plot histograms
figure;mh_ens = histogram(ends); % from ensemble
mh_ens.BinWidth = 1;mh_ens.Normalization = 'pdf'; mh_ens.DisplayStyle = 'stairs';mh_ens.EdgeColor = [0.7 0 0];hold on;mh=histogram(x_ssa1(:,P)); % along a single trajectory
mh.BinWidth = 1;mh.Normalization = 'pdf';mh.DisplayStyle = 'stairs'; mh.EdgeColor = [0 0.7 0];hold on;fplot(@(x) g(x,km_/kdp_,kp_/kdm_),mh.BinLimits,'Color', 'r') ylim([0 inf])hold on;fplot(@(x) g(x,km_/kdp_+1,kp_/kdm_),mh.BinLimits, 'Color', 'g') %%%%%Why a + 1 is OK ?????
ylim([0 inf])xlabel('P');ylabel('Probability density function');title('Histograms');legend('-DynamicLegend', 'P, ensemble, LogDecimation=1', 'P, single traj','theory, \gamma(x;a,b) (correct)', '\gamma(x;a+1,b)');
A strange thing is that log decimation defined by tlog_ens must be set to 1. If we substitute in the code any value different than 1, it gives a wrong histogram, probably because some data are omitted.
Best Answer