MATLAB: SimBiology: Stochastic model of gene expression gives incorrect distributions? If histogram is collected from a single trajectory, the trajectory needs to be resampled to equal intervals.

friedman modelgamma distributionreaction ratesSimBiologystochastic model of gene expression

To test SimBiology, I wrote a simple model of gene expression, whose analytical solutions are well known (Friedman, Phys. Rev. Lett. 97.16 (2006): 168302):

O -> O + M ; km
M -> M + P ; kp
M -> null ; kdm
P -> null ; kdp

If we run the stochastic simulation, e.g. using the 'ssa' solver, then the histogram of P should be a gamma distribution g(P; a, b) where a=km/kdp and b=kp/kdm.

Other software for stochastic simulation give correct results. But SimBiology produces results that are different than expected. The histogram of P is not g(P; a, b) but it seems to be g(P; a+1, b) for any choice of reaction rate constants.

(Fig. 1)

Parameters in the figure are the following:

km_ = 0.5;
kp_ = 50.;
kdm_ = 7.;
kdp_ = 0.2;

(For comparison, I used the two simulation programs: StochPy and Mesokin ( http://pepe.ichf.edu.pl/tabaka/Witryna/Marcin_Tabaka.html ) )

Is this a bug or did I define something incorrectly in my script?

Below is the fragment of my code:

%%Create a model of an unregulated gene
model = sbiomodel('Unregulated gene');
model.set('Name', 'Unregulated gene');
%%Species names
speciesNames = {'O', 'M', 'P'};
%%Reactions:
Transcription = addreaction(model, 'O -> O + M');
Translation = addreaction(model, 'M -> M + P');
mRNA_degradation = addreaction(model, 'M -> null');
Protein_degradation = addreaction(model, 'P -> null');  
%%Set Reactions to be MassAction
kl_Transcription = addkineticlaw(Transcription, 'MassAction');
kl_Translation = addkineticlaw(Translation, 'MassAction');
kl_mRNA_degradation = addkineticlaw(mRNA_degradation, 'MassAction');
kl_Protein_degradation = addkineticlaw(Protein_degradation, 'MassAction'); 
%%Add Rate Constants for Each Reaction
p_km   = addparameter(kl_Transcription,       'km',   'Value', km_);
p_kp   = addparameter(kl_Translation,         'kp',   'Value', kp_);
p_kdm  = addparameter(kl_mRNA_degradation,    'kdm',  'Value', kdm_); 
p_kdp  = addparameter(kl_Protein_degradation, 'kdp',  'Value', kdp_);  
%%Set the Kinetic Law Constants for Each Kinetic Law.
kl_Transcription.ParameterVariableNames = {'km'};
kl_Translation.ParameterVariableNames = {'kp'};
kl_mRNA_degradation.ParameterVariableNames = {'kdm'};
kl_Protein_degradation.ParameterVariableNames = {'kdp'}; 
%%Specify Initial Amounts of Each Species
O = 1; % Species names
M = 2;
P = 3;
model.species(O).InitialAmount = 1;         % O : Species amounts
model.species(M).InitialAmount = 0;         % M
model.species(P).InitialAmount = 0;         % P
%%Simulation  time
Tfinal = tfinal; % from command line
%%Get the Active Configuration Set for the Model.
cs = getconfigset(model,'active');
%%Simulate Model Using SSA Stochastic Solver : Single trajectory
tfinal_ssa = Tfinal;
tlog = 1;
cs.SolverType =method; %'expltau'; %'ssa';

cs.StopTime = tfinal_ssa;
solver = cs.SolverOptions;
solver.LogDecimation = tlog;
cs.CompileOptions.DimensionalAnalysis = false;
simdata_ssa = sbiosimulate(model);
[t_ssa1, x_ssa1] = selectbyname(simdata_ssa, speciesNames);

EDIT: One more important information: These histograms were collected from the single simulation trajectory x_ssa1, as the process is ergodic, so this should be equivalent to histogramming the end points of many shorter trajectories.

But perhaps a single trajectory is here recorded in such a way that the histogram is biased?

——————————————————

EDIT 2: The problem occurs when histograms are built from a single trajectory (simply, all data points from that trajectory are histogrammed). The problem does not occur in the case of histograms over an ensemble of trajectories (many trajectories are run, only end points of these trajectories are collected and histograms are built from these end points).

This is rather confusing because in reality the system is ergodic and histograms along a single (sufficiently long) trajectory should be same as ensemble histograms.

As I have shown in Fig. 1, other simulators (Mesokin, StochPy) generate correct histograms along a single trajectory.

(Fig. 2)

I added the code for ensemble simulation:

%%Simulate Model Using SSA Stochastic Solver : Ensemble
numRuns = numRuns_ens; % as input parameter

tfinal_ssa_ens = Tfinal_ens; % as input parameter
tlog_ens =  1; % any value different than 1 gives wrong histograms 
               % probably because some data are omitted
cs.SolverType = method; %'expltau'; %'ssa';
cs.StopTime = tfinal_ssa_ens;
solver = cs.SolverOptions;
solver.LogDecimation = tlog_ens;
cs.CompileOptions.DimensionalAnalysis = false;
simdata_ssa_ens = sbioensemblerun(model, numRuns);
% Collect last values of 'P' from each trajectory, 
% i.e., those at time Tfinal_ens:
for i = 1:numRuns
    ends(i) = simdata_ssa_ens(i).Data(end,end); % last row, last column
end

I run it with the parameters:

Tfinal = 100000;
Tfinal_ens = 50;
numRuns_ens = 20000;
km_ = 0.5;
kp_ = 50.;
kdm_ = 7.;
kdp_=  0.2;

Here is how I plot histograms:

%%Plot histograms 
figure;
mh_ens = histogram(ends); % from ensemble
mh_ens.BinWidth = 1;
mh_ens.Normalization = 'pdf'; 
mh_ens.DisplayStyle = 'stairs';
mh_ens.EdgeColor = [0.7 0 0];
hold on;
mh=histogram(x_ssa1(:,P)); % along a single trajectory 
mh.BinWidth = 1;
mh.Normalization = 'pdf';
mh.DisplayStyle = 'stairs'; 
mh.EdgeColor = [0 0.7 0];
hold on;
fplot(@(x) g(x,km_/kdp_,kp_/kdm_),mh.BinLimits,'Color', 'r')  
ylim([0 inf])
hold on;
fplot(@(x) g(x,km_/kdp_+1,kp_/kdm_),mh.BinLimits, 'Color', 'g')  
%%%%%Why a + 1 is OK ?????
ylim([0 inf])
xlabel('P');
ylabel('Probability density function');
title('Histograms');
legend('-DynamicLegend', 'P, ensemble, LogDecimation=1', 'P, single traj','theory, \gamma(x;a,b) (correct)', '\gamma(x;a+1,b)');

A strange thing is that log decimation defined by tlog_ens must be set to 1. If we substitute in the code any value different than 1, it gives a wrong histogram, probably because some data are omitted.

Best Answer

OK, I have found the answer to my own question. Perhaps this will help someone. One can make histograms out of a single trajectory, but it must be first resampled into equal time intervals. This is done by the function resample().

Below I show the histograms for M and P species. As said above, the theoretical curve for P is gamma distribution g(x,km_/kdp_,kp_/kdm_). For M species it is Poisson distribution f(x,km_/kdm_). Before resampling the histograms are wrong, but after resampling they agree with the theoretical functions.

Tfinal = 100000;
km_ = 0.5;
kp_ = 50.;
kdm_ = 7.;
kdp_=  0.2; 
%%Resampling of the single trajectory
simdata_ssa2 = resample(simdata_ssa, [0:Tfinal], 'zoh'); 
[t_ssa2, x_ssa2] = selectbyname(simdata_ssa2, speciesNames);
%%Theoretical function, P distribution, gamma
      function [y] = g(x, a_, b_)
          y = (x^(a_-1.) ) * exp(-x/b_) / (b_^a_ * gamma(a_)); 
          % gamma distribution
      end
%%Theoretical function, M distribution, Poisson
      function [y] = f(x, lambda)
          y =  exp(-lambda)*lambda^x/ factorial(x); % Poisson distribution
      end
%%Plot histograms, P
figure;
mh=histogram(x_ssa1(:,P)); % along a single trajectory 



mh.BinWidth = 1;
mh.Normalization = 'pdf';
mh.DisplayStyle = 'stairs'; 
mh.EdgeColor = [0 0.7 0];
hold on;
mh2=histogram(x_ssa2(:,P)); % along a single trajectory 
mh2.BinWidth = 1;
mh2.Normalization = 'pdf';
mh2.DisplayStyle = 'stairs'; 
mh2.EdgeColor = [0 0 0];
hold on;
fplot(@(x) g(x,km_/kdp_,kp_/kdm_),mh.BinLimits,'Color', 'r')  
ylim([0 inf])
xlabel('P');
ylabel('Probability density function');
title('Histograms');
 legend('-DynamicLegend', 'P, single trajectory',...
  'P, single trajectory, resampled',...
    'theory, \gamma(x;a,b) (correct)');
%%Plot histograms, M
figure;
mh=histogram(x_ssa1(:,M)); % along a single trajectory 
mh.BinWidth = 1;
mh.Normalization = 'pdf';
mh.DisplayStyle = 'stairs'; 
mh.EdgeColor = [0 0.7 0];
hold on;
mh2=histogram(x_ssa2(:,M)); % along a single trajectory 
mh2.BinWidth = 1;
mh2.Normalization = 'pdf';
mh2.DisplayStyle = 'stairs'; 
mh2.EdgeColor = [0 0 0];
hold on;
fplot(@(x) f(floor(x),km_/kdm_),mh.BinLimits,'Color', 'r')  
ylim([0 inf])
xlabel('M');
ylabel('Probability density function');
title('Histograms');
legend('-DynamicLegend', 'M, single trajectory',...
'M, single trajectory,...
resampled','Poisson')

Best Answer

Related Solutions

MATLAB: Adding parameters while programmatically setting up a SimBiology model

MATLAB: Unable to find explicit solution to differential eq

Related Question