MATLAB: Time converted to a str

datevectime

I have Matlab R2012b and am importing a rather large .csv file that is created by another tool. The only column I care about from this file is the one with all of the timestamps in it so I've imported the column as text but it ends up as a cellstring. As I need to convert the timestamp to Seconds of Day to find out if there are any gaps in my data I have done the following:
filename = 'myfilename.csv';
delimiter = ',';
startRow = 2; %it always starts at row 2 as row 1 is the header name
FormatIn = 'HH:MM:SS.FFF'; % defines my time format for the datevec function
formatSpec = '%*s%*s%*s%*s%*s%*s%s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%[^\n\r]';
fileID = fopen(filename,'r');
textscan(fileID, '%[^\n\r]', startRow-1, 'ReturnOnError', false); %reads my file
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'EmptyValue' ,NaN,'ReturnOnError', false); %creates my data array
fclose(fileID);
Time = dataArray{1, 1}; %outputs the column I want from the .csv file
Time1 = num2str(cell2mat(Time)); % where Time is my input data
TimenoDay = Time1(5:end); % This is supposed to remove the 246 and space leaving me with 14:32:54.342 231
TimenoSpaces = TimenoDay(~isspace(TimenoDay)); % This is supposed to remove all spaces leaving me with 14:32:54.342231
[Y M D H MN S] = datevec(TimenoSpaces,formatIn); this is supposed to break the time into a vector array
SOD = (H*3600+MN*60+S); % this should convert all of my time stamps to Seconds of Day
It works fine till I get to the datevec function and it runs very slow, as in 30 minutes+. Now, it is ~ 5 million rows of data, but I did something similar yesterday and it only took a few minutes to run the datevec function with a similar amount of data. The only difference between the two scripts is that the one that want quick had a normal time format and I could import it as a Time format (14:42:36.243213) instead of as text with the script above. I think my problem is because of the Julian day and the extra spaces in the timestamp.
Does anyone have any ideas that could help me speed this up? Make my code better?
Edit#1: So I see why the datevec isn't working. when I run
Time1 = num2str(cell2mat(Time)
it returns a 20 character cell with all 2's in it. But if I do
Time1 = num2str(cell2mat(Time(1:1)))
and just look at the first cell of data it works fine.

Best Answer

Again, pasting a (partial even) row of text with the actual field of interest would relieve confusion regarding the form, but presuming the 'sss' is actually microseconds (since the .FFF includes down to the msec resolution) and is in the file, then I'd do something like
fmt=[repmat('%*s',1,'G'-'A') '%3f %2f:%2f:%f %*[^\n]'];
d=cell2mat(textscan(fid,fmt,'delimiter',','headerlines',1,'collectoutput',1));
Now you'll have have an array Nx4 that are the [day,hr,min,sec.fff] values and you can pass those directly to datevec with the leading columns yr,mo as zero or some arbitrary start date (it'll all get subtracted out in the end so is immaterial as to absolute value and the time fields are all stored in the fractional portion of the datenum value. Keeping the day will eliminate issues regarding clock rollover if any data were to have been collected over the bewitching midnight hour.
Then, simply looking for
diff(datenum([YR0 1 d])>Setpoint)
will give you the desired answer. The above should be pretty quick.
Again, all above is air-code w/o an example subset of data to 'spearmint with..."salt to suit" :)
ADDENDUM
OK, with the sample data file to check on formatting, the rest is pretty much as outlined. You don't have any calendar data other than DOY and Matlab datenum needs a reference year so just pick something arbitrary as you're only looking for differences so it all gets subtracted out again, anyway.
dn=datenum(2000,0,dat(:,1),dat(:,2),dat(:,3),dat(:,4)+dat(:,5)/1E6);
dt=diff(dn)*86400*1000;
Above takes the date information you have and creates a datenum sequence. As noted before, this will have the benefit of handling clock rollover, etc., automagically. Then the "delta-t" dt is the sample time differential in msec which you can test. For a sample subset of the file supplied, this looks like--
>> dt(10:40)
ans =
0.0905
0.4526
0.7343
0.6940
0.6840
0.6940
0.6940
0.6940
10.2494
0.3722
0.3722
0.6940
0.5532
0.6840
0.1308
0.3017
0.6940
0.5834
0.2112
0.1308
0.4124
14.8058
0.3722
0.3722
0.6940
0.5532
0.6840
0.1408
0.2917
0.1509
0.3923
>>
Seems like the acquisition must be on a time-sharing system where there's a periodic latency, maybe???
Anyway, your end object is now simply to find the locations that don't satisfy your criterion. There are several choices for how to report this; you can use a logical vector of locations that satisfy, find locations that don't, report if any don't or all do, etc., etc., etc., ...
>> isOK=all(dt<90)
isOK =
1
>> sum(dt>10)
ans =
4
>> find(dt>90)
ans =
Empty matrix: 0-by-1
>> find(dt>10)
ans =
18
31
53
61
>>
Above uses a couple different fixed setpoints; you'll make that a variable, of course. I used an example of 10 just to illustrate the case of a few failing. The logical-addressing logic vector solution returns a full-length logical array of T/F, of course. Which turns out easiest to use is all dependent upon end objectives.