MATLAB: Find repeated expression in array of strings, return logical.

logical indexingregexp

I have data of the type
looking_for = ["apple", "melon"]
in
my_data = ["The apple is red", "The bee was yellow", "I am eating a melon", "The melon is sweet"]
with
timing = [2.5, 5, 10, 18]
I want to find when a regular expression was repeated consecutively and then return a logical index that pertains to the first observation of the repetition.
My approach:
1) Find out if the string contains one of the regular expression in looking_for, e.g. melon. I solve this using
idx = cellfun(@(x)( ~isempty(x) ), regexp(my_data, "apple"));
2) Then i transpose and multiply my indexing with the timing to get the relevant timings & remove the zeros (not shown here)
apple_timing = transpose(idx).*timing;
Which would give me a cell called apple_timing with a value of 2.5, which is exactly what I want.
I would like a bit of code that returns a variable called repeat_timing. In the case of the melon, this would return 18 – the first observed consecutive repeat of the regular expression melon.

Best Answer

Here is one solution based around cumsum:
% Data:
LF = {'apple', 'melon'};
MD = {'The apple is red','The bee was yellow','I am eating a melon','The melon is sweet'};
TV = [2.5, 5, 10, 18];
% Locate patterns:
fun = @(p)~cellfun('isempty',strfind(MD,p));
BM = cell2mat(cellfun(fun,LF(:),'uni',0));
CS = cumsum(BM,2);
You can use this to identify the first, second, third, etc. times that a pattern occurs, and find the related timing value:
>> [R1,C1] = find(CS==1 & BM); % First occurrence.
>> LF{R1}
ans = apple
ans = melon
>> TV(C1)
ans =
2.5000 10.0000
>> [R2,C2] = find(CS==2 & BM); % Second occurrence.
>> LF{R2}
ans = melon
>> TV(C2)
ans = 18
You can easily automate this for an arbitrary number of matches, here I locate the first, second, and third occurrences (of which there are none in your sample data):
baz = @(n)find(CS==n & BM);
[row,col] = arrayfun(baz,1:3,'uni',0);
typ = cellfun(@(r)LF(r),row,'uni',0);
val = cellfun(@(c)TV(c),col,'uni',0);
giving:
>> typ{:}
ans =
'apple'
'melon'
ans =
'melon'
ans = {}
>> val{:}
ans =
2.5000 10.0000
ans = 18
ans = []
>>