MATLAB: Check a date in a formatted text file for a match, regexp help

datefile readingformatspecregexptext file

In the attached file there are some Categories for some dates (positively disturbed, negatively disturbed, quiet). I am trying to write a program that takes a date e.g. '12-Mar-2010' and checks from this file what category that date belongs to. (Or if it is not in any of the three categories). I'm stuck with file reading due to the strange format of this file. I have tried textscan, fileread, regexp with different formatspecs but am unable to get all this data into some easily to process form (matrices may be) for easy comparison with given date/dates. Any help is appreciated.

Best Answer

A fairly straightforward method:
%read whole file in one go
fid = fopen('q_d_daysctl.txt', 'rt'); %using 't' to autoconvert \r\n into \n. Unfortunately fileread does not do that
filecontent = fread(fid, '*char')';
fclose(fid);
%parse file using regexp. Because of the lineanchor option, requires \n as EOL, not \r\n.
%regex has eight tokens
%1: optional, text, space and ':' to match the line heading (positively disturbed, etc.)
%2 - 6: match number or '.' and the following spaces
%3: match month (the '.' is not part of the token but part of the match)
%4: optional, match 4 digit year and preceding space
rows = regexp(filecontent, '^([A-Za-z: ]+)?(\d+|\.) +(\d+|\.) +(\d+|\.) +(\d+|\.) +(\d+|\.) +([A-Z][a-z]{2})\.( \d{4})?$', 'tokens', 'lineanchors');
rows = vertcat(rows{:}); %convert into nx8 cell array
%fill holes, convert numbers, then convert to table
rows(:, 1) = fillmissing(strtrim(rows(:, 1)), 'previous');
rows(:, 2:6) = num2cell(str2double(rows(:, 2:6)));
rows(:, 8) = num2cell(fillmissing(str2double(rows(:, 8)), 'previous'));
datecat = cell2table(rows, 'VariableNames', [{'category'}, compose('day%d', 1:5), {'month', 'year'}])
%unstack the whole lot for easy search
datecat = stack(datecat, 2:6, 'NewDataVariableName', 'day', 'IndexVariableName', 'ignore')
datecat(isnan(datecat.day), :) = []; %get rid of days that were '.' in the original file
For good measure, I'd convert the month, year, day columns into a single datetime column and the category column into a categorical, but the datecat table is easily searchable as it is.