[rnaFile, message] = fopen(outputFile, 'w');
While not your problem, should check for the output file opening successfully as well as input...
[base, num] = fread(dnaFile, 1, 'char');
The above reads one character, but returns it as a double, not a character.
Use
[base, num]=fread(dnaFile,'*char');
to read whole file into a character array.
while num > 0
if base == 'C'
...
The above starts an infinite loop as num=1 and there's nothing inside the loop that ever changes num so it stays there forever...oh, although it will eventually error out on EOF on the read...scanned too quickly first.
You can loop, but Matlab is vectorized; may as well make use of it.
out=repmat('_',size(base));
out(base=='C')='G';
out(base=='G')='C';
out(base=='T')='A';
out(base=='A')='U';
freqG=sum(out=='G');
freqC=sum(out=='C');
freqA=sum(out=='A');
freqU=sum(out=='U');
totalNum=sum([freqG freqC freqA freqU]);
freqG=sum(out=='G')/totalNum;
freqC=sum(out=='C')/totalNum;
freqA=sum(out=='A')/totalNum;
freqU=sum(out=='U')/totalNum;
ADDENDUM:
Couldn't see an issue otomh so did a trial with just made up sequence...
>> dna=['A', 'C', 'G', 'T'];
>> dna=repmat(dna,1,10);
>> dna=dna(randperm(length(dna)));
>> dna(randperm(length(dna),3))='_';
>> dna
dna =
CTTGCCTCC_GGA_ATGAATGCAACACAGTT_GGTGCATC
The algorithm above starts here:
>> rna=repmat('_',size(base));
>> rna(dna=='C')='G';
>> rna(dna=='G')='C';
>> rna(dna=='T')='A';
>> rna(dna=='A')='U';
>> [dna;rna]
ans =
CTTGCCTCC_GGA_ATGAATGCAACACAGTT_GGTGCATC
GAACGGAGG_CCU_UACUUACGUUGUGUCAA_CCACGUAG
Looks like what problem statement asked for...compute frequency
Did this in "more Matlab-y" way; order is alphabetic to satisfy histc
>> RNA='ACGU';
>> freq=histc(rna,RNA)
freq =
9 9 10 9
>> freq=freq/sum(freq)
freq =
0.2432 0.2432 0.2703 0.2432
>>
>> [RNA.' num2str(freq.','%.4f')]
ans =
A 0.2432
C 0.2432
G 0.2703
U 0.2432
>>
If order is important revert to previous or a "cute" way would be to use the categorical datatype--
>> rnac=categorical(cellstr(rna),{'G';'C';'A';'U';'_'});
>> summary(rnac)
G 10
C 9
A 9
U 9
_ 3
>>
ADDENDUM 2:
And, the yet more Matlab-y way vectorizes the translation via lookup...
>> DNA=['C','G','T','A','_'];
>> RNA=['G','C','A','U','_'];
>> RNA(arrayfun(@(c) find(c==DNA),dna))
ans =
GAACGGAGG_CCU_UACUUACGUUGUGUCAA_CCACGUAG
Best Answer