MATLAB: I have a string of DNA bases and I need to count the number of times I have two identical bases at a certain distance from each other.

dnarepetition

I have a string of DNA bases and I need to count the number of times I have two identical bases at a certain distance from each other. For example- number of times for 'AA', 'AXA', 'AXXA' and so on.. Would love some help with finding the right function.

Best Answer

sum(regexp(yourchararray, 'A[^A]{0,2}A'))
2 being the maximum distance between two A (and 0 the minimum). The regexp says match A followed by 0 to 2 not A, followed by a A.