MATLAB: Does the score returned by NWALIGN not behave intuitively when I use the ambiguous symbols ‘W’ or ‘V’ in the Bioinformatics Toolbox 3.0 (R2007b)

The score returned by NWALIGN does not contain intuitive values when I use ambiguous symbols ‘W’ or ‘V’ in the Bioinformatics Toolbox 3.0 (R2007b).

For example when using two short nucleotide sequences that I wish to align: "AGCT" and "AACT". The alignment of these two sequences with NWALIGN gives a score of 7.67. The alignment of "AACT" with its own copy ("AACT") produces a score of 9.33 (maximum score this alignment can have). Now, if I replace the "G" in "AGCT" with the ambiguous symbol "W" that can be either "A" or "T", the alignment should give me a rather high score, since "A" will match with "W". I obtain score 6.67, which is lower that the real mismatch score. Shouldn't a match between "W" and "A" score better than a match between "G" and "A"? Moreover, is the fact that the sequence "AVCT" ("V" stands for any nucleotide except for "A") scores 7.67, which is higher that for alignment between "AWGT"

The following code illustrates the example:

Score = nwalign('AGCT','AACT')
Score = nwalign('AACT','AACT')
Score = nwalign('AWCT','AACT')
Score = nwalign('AVCT','AACT')

Score = nwalign('AACT','AACT','alpha','nt') >>Score = 5.5463 Score = nwalign('AGCT','AACT','alpha','nt') >>Score = 3.0505 Score = nwalign('AWCT','AACT','alpha','nt') >>Score = 4.4371 Score = nwalign('AVCT','AACT','alpha','nt') >>Score = 3.8824

Best Answer

The expections listed above are correct. However, the reason you are seeing a difference in the outputs is because by default the alignment functions assume the input sequences are aminoacids and therefore the alignments are scored with the BLOSUM50 matrix, if you change the alphabet to nucleotides the default scoring matrix is NUC44, this matrix accounts for scores between ambiguous nucleotide symbols appropriately. This is shown below:

Best Answer

Related Solutions

MATLAB: How to use a custom scoring matrix with the NWALIGN function in Bioinformatics Toolbox 3.1 (R2008a)

MATLAB: Does the Bioinformatics Toolbox support Selenocysteine

Related Question