MATLAB: Loop through the DNA array and record all of the locations of the triplets (codons): ‘AAA’, ‘ATC’ and ‘CGG’.

genomesequence

My code so far is functional, but I don't think that it's correct. I am supposed to loop through the cell array and record the locations of each codon, while skipping over the ones that contain a character from preceding codon. For example, if part of the sequence contains [A,T,C,C,G,G] then the section with CCG should be skipped. I'm just not entirely sure what the best way to do that would be.
Here is what I have so far:
fid = fopen('sequence_long.txt','r')
A = textscan(fid,'%3s');
DNA = A{1};
fclose(fid);
i = 1;
%loops through array and counts codon occurrences
%finds the index location of individual codons
while i < length(DNA)
i = i + 1;
if strcmp(DNA(i),'AAA')
num_AAA = nnz(strcmp(DNA,'AAA'));
loc_AAA = find(strcmp(DNA,'AAA'));
elseif strcmp(DNA(i),'ATC')
num_ATC = nnz(strcmp(DNA,'ATC'));
loc_ATC = find(strcmp(DNA,'ATC'));
elseif strcmp(DNA(i),'CGG')
num_CGG = nnz(strcmp(DNA,'CGG'));
loc_CGG = find(strcmp(DNA,'CGG'));
end
end
fprintf('The number of AAA values is: %.f',num_AAA)
fprintf('The index location of AAA values: %.f\n',loc_AAA(1:10))
fprintf('The number of ATC values is: %.f',num_ATC)
fprintf('The index location of ATC values: %.f\n',loc_ATC(1:10))
fprintf('The number of CGG values is: %.f',num_CGG)
fprintf('The index location of CGG values: %.f\n',loc_CGG(1:10))

Best Answer

One workaround is to iterate over the sequence and skip the next two characters whenever we find a codon.
You can look at the below code for your reference.
DNA = 'AAATCATCGGCGGATC';%Example sequence
i = 1;
loc_AAA = [];
loc_ATC = [];
loc_CGG = [];
num_AAA = 0;
num_ATC = 0;
num_CGG = 0;
while i <= length(DNA)-2
if DNA(i)=='A' && DNA(i+1)=='A' && DNA(i+2)=='A'
loc_AAA = [loc_AAA i];
num_AAA = num_AAA + 1;
i = i + 3; %Skip the next two characters
elseif DNA(i)=='A' && DNA(i+1)=='T' && DNA(i+2)=='C'
loc_ATC = [loc_ATC i];
num_ATC = num_ATC + 1;
i = i + 3;
elseif DNA(i)=='C' && DNA(i+1)=='G' && DNA(i+2)=='G'
loc_CGG = [loc_CGG i];
num_CGG = num_CGG + 1;
i = i + 3;
else
i = i + 1;
end
end