MATLAB: How to enhance the performance of for-loops and cell-arrays (related to statistical calculations)

cell arraysfor loopMATLAB and Simulink Student Suitematrix arrayparallel computingperformancestatistics

The following code calculates some performance measures out of different periodes. No error messages occured, but the processing time is very long.

F = 'runoff.txt'; % name of the file
D = 'C:\Users\heute\model\results\model_standalone\'; % absolute or relative path of base directory
S = dir(fullfile(D,'results*'));
X = [S.isdir] & ~ismember({S.name},{'.','..'});
N = {S(X).name};
L = cell(size(N));
C = cell(size(N));
for k = 1:numel(N)
    T = fullfile(D,N{k},F);
    fid = fopen(T,'rt');
    fmt = ['%s',repmat('%f',1,6)];
    opt = {'HeaderLines',1,'CollectOutput',true};
    Z = textscan(fid,fmt,opt{:});
    fclose(fid);
    L{k} = Z{1}; % timestamp
    C{k} = Z{2}; % data
    %        
    Qs = C{k}(:,6); % define the simulated runoff, as column 6 in each cell array

    %





    % define the periodes for computing performance measures
    sdatelim_neu = [datenum(2013,10,01,00,00,00) datenum(2016,10,01,00,00,00)];
    dt = 1/24;
    date = sdatelim_neu(1):dt:sdatelim_neu(2);
    date_runoff = transpose(date);
    %
    sdatelim1 = [datenum(2014,05,01,00,00,00) datenum(2014,10,01,00,00,00)];
    dt = 1/24;
    sdate_sdatelim1 = sdatelim1(1):dt:sdatelim1(2);
    %
    sdatelim2 = [datenum(2015,05,01,00,00,00) datenum(2015,10,01,00,00,00)];
    sdate_sdatelim2 = sdatelim2(1):dt:sdatelim2(2);
    %
    sdatelim3 = [datenum(2016,05,01,00,00,00) datenum(2016,10,01,00,00,00)];
    sdate_sdatelim3 = sdatelim3(1):dt:sdatelim3(2);
    %
    % loop over the different periodes
    for s = 1:length(sdate_sdatelim1);
        for a = 1:length(sdate_sdatelim2);
            for b = 1:length(sdate_sdatelim3);
                j = find(date_runoff >= sdate_sdatelim1(s) & date_runoff < sdate_sdatelim1(k)+dt) & find(date_runoff >= sdate_sdatelim2(a) & date_runoff < sdate_sdatelim2(a)+dt) & find(date_runoff >= sdate_sdatelim3(b) & date_runoff < sdate_sdatelim3(b)+dt);
                f_1k = 1-cov(Qs(j) - Qo)/var(Qo); %NSE
                f_2k = sqrt(mean((Qs(j) - Qo).^2)); %RMSE
                f_3k = abs(mean(Qs(j)- Qo)); %BIAS
                %Qo is the observed runoff -> imported from file
                %
                % write into matrix YA -> for use in further analysis
                YA = [f_1k, f_2k, f_3k];
            end
        end
    end
end

As a test case, I ran this code for two inputfiles (each of them has 26280 rows in column 6). In the end however several 1000 input-files should be processed.

How can I reduce the computing time?

or is there an error within the for-loop over the different periods? or is this:

Qs = C{k}(:,6); % define the simulated runoff, as column 6 in each cell array

an inefficient command?

(I use Matlab R2012a)

S = 'runoff.txt'; O = 'runoff_observed.txt'; D = 'C:\Users\heute\model\results\model_standalone\'; d = dir(fullfile(D,'results*')); % list of directories fmtS = ['%{dd.MM.yyyy-HH:mm}D' repmat('%*f',1,5) '%f']; % simulated format string fmtO = ['%{dd.MM.yyyy HH:mm}D' %f']; % observed format string L=length(d); % number sudirs YA=zeros(L,3); % preallocate for k = 1:L % iterate over subdirs fid = fopen(fullfile(D,d{k}.name,S),'rt'); % open simulated Z=textscan(fid,fmtS,'headerlines',2,'collectoutput',1); % read simulated fclose(fid); dtS=Z{:,1}; % timestamp simulated (datetime) Qs=Z{:,2}; % simulated data fid = fopen(fullfile(D,d.name{k},O),'rt'); % open observed Z=textscan(fid,fmtO,'headerlines',1,'collectoutput',1); % read observed fclose(fid); dtO=Z{:,1}; % timestamp observed (datetime) Qo=Z{:,2}; % observed data % define the periods for computing performance measures yr1=2014; yr2=2016; % years to compute over output ix=isbetween(dtO,datenum(yr(1),05,01),datenum(yr(1),10,01)); % first year for yr=yr1+1:yr2 % subsequent years ix=ix | isbetween(dtO,datenum(yr,05,01),datenum(yr,10,01)); end YA(k,:) = [f_1k, f_2k, f_3k]; end

Best Answer

ADDENDUM

Cleaned up to incorporate changes from conversation below excepting for opening the reference file--treat that as need to. Above should then return the L records in the output array.

ERRATUM

NB: Remove the (1) index from year reference yr in the loop to get the subsequent years after first...inadvertently left it in there when copied line.

Best Answer

Related Solutions

MATLAB: Datetime with variable format

MATLAB: How can thousands of matrix lines be written into a multidimensional cell-array

Related Question