Hello,
I want to find a set of substrings (between 19 and 24 characters long, 'ACGT' mix = DNA sequences) in a bigger string (template DNA) and replace them with '*' for the length of the substring. I have following code.
%"template" is a 8x1 cell array with original DNA sequence data (araound 1800 chars each). To minimize the example I just go through the first cell.
%"substring" is e.g. a 50x2 cell array, with column 1 = substring and olumn 2 = length of the substring.
%"substituted_seq" is a 8x1 cell array with the replaced sequence (substrings substituted by '*')
%
substituted_seq{1,1} = strrep(template{1,1},substring{1,1},'*'); for j=1:size(substring,1) substituted_seq{1,1} = strrep(substituted_seq{1,1},substring{j,1},'*'); end
The first problem I have is, that these substrings are overlapping with each other. So when I replace the first substring with '*' and search for the next one (which is overlapping the first) this code will not replace it anymore.
Second: I also couldn't figure out, how to replace a substing of e.g. 'ACGTCG' with the same number of '*' (in this example '******').
I would be very grateful for any help. Thanks!
Best Answer