I am working with long time-series. My data acquisition system takes one point per second for months – except when it hangs for a few minutes or hours, then there will be no data for this time interval.
I want to down-sample this data to say one point per 60 seconds. However, because there can be "gaps" in the data, I am struggling to find efficient code for this!
I tried the following approach for a typical array (called "data") that holds 0.5 million rows and two columns: The first column is the time, the second column is the actual data.
start_time = data(1,1);end_time = data(end,1);time_step = 60/(3600*24); total_time = end_time - start_time;% Prepare the downsampled data array.
downsampled_data = zeros(floor(total_time/time_step),2);ticfor i = 1:length(downsampled_data) % For each time intervall, find all points in this intervall, and
% average over them
downsampled_data(i,1) = start_time + (i-0.5)*time_step; downsampled_data(i,2) = mean(data(data > start_time + (i-1)*time_step &... data < start_time + i*time_step,2));endtoc
As a final step I would have to fish out those points where there is no data… However, the above code takes about 60 seconds to run for .5 million points – and I need it to run in less than 10 minutes for an array of 5 million points. Can you guys think of a way of speeding it up?
Best Answer