MATLAB: Find locations of repeated values

findlocatelocationsearch

So, I have this function that takes a set of data and finds if there are values that repeat for more than 300 seconds in that data set…\
function FindRepetition(TruckVariableName)
setpref('Internet','SMTP_Server','lamb.corning.com');
data1 = (TruckVariableName);
x = length(TruckVariableName);
data = reshape(data1, 1, x);
datarep = ~diff(data) & data(2:x) ~= 0; %binary data -- 1 means repeats, 0 means different, excludes repetitive zeros
%if the difference in the data at each point is zero, and if the data at
%that point isn't itself zero, return true. 2:x means difference array is equal to the length of the data array, matrix dimensions must be the same or &
%cannot be used
datarepstr = num2str(datarep); %convert to string
s = regexprep(datarepstr,' ',''); %remove spaces
[startindex,runs] = regexp(s,'1+','start','match'); %find all runs and the point where they start
l = cellfun('length',runs); %find the length of each run
y = l > 300;
if any(y) %if any run is longer than 5 minutes, display message
%sendmail('johnsonlj2@corning.com', '2011 KENWORTH ISX15','A data fault has been detected - Prolonged data repetition');
disp('--An error has occurred - Prolonged data repetition.');
disp('Errors occurred at');
end
end
I want to find WHERE those repeated values start in that set of data. I tried disp(find(y));, but that finds the locations of the data set y, which is not the original data set. Anyone know how I can find the locations of data1 where the data repeats for more than 300 seconds?

Best Answer

I think that you can use two approaches. I'll illustrate with a simple example: say we have the following data
>> data = [7 8 8 8 8 6 6 7 8 7 7 7] ;
and we want to get blocks of repeating values with at least 3 elements.
1. Based on your REGEXP method, you would indeed look for the position of streams of 1's larger than a given value.
>> rep = ~diff(data) % Add other components if needed.
rep =
0 1 1 1 0 1 0 0 0 1 1
>> repStr = sprintf('%d', rep)
repStr =
01110100011
>> start = regexp(repStr, '1{2,}', 'start') % 3 similar values -> 2
start = % repetitions.
2 10
2. Without conversion to string and REGEXP:
>> buffer = [true, diff(data)~=0]
buffer =
1 1 0 0 0 1 0 1 1 1 0 0
>> groupStart = find(buffer)
groupStart =
1 2 6 8 9 10
>> groupId = cumsum(buffer)
groupId =
1 2 2 2 2 3 3 4 5 6 6 6
>> groupSize = accumarray(groupId.', ones(size(groupId))).'
groupSize =
1 4 2 1 1 3
>> start = groupStart(groupSize > 2)
start =
2 10
EDIT: note that the 2nd method is more than 5 times faster than the 1st on large datasets.