[Math] DNA sequence – probability

probability

Your DNA code is composed of a series of four nucleotides: adenine, guanine, thymidine and cytosine (A, G, T and C, respectively).

a) What is the probability an individual has the following nucleotide sequence: “TATATA” at any particular position? You may assume independence.

My answer: The probability of getting "TATATA" is (1/4)(1/4)(1/4)(1/4)(1/4)(1/4).

b) What is the probability that an individual has k T’s in their DNA code at any particular position? (k can be any integer and you may assume independence). Here we're looking for the probability of k consecutive Ts.

My answer: The probability of k consecutive "T"s (in fact any k specific nucleotides) is (1/4)^k.

Thoughts please.

Best Answer

Depends on the length of DNA.
For example, if the length of DNA is, say 3 units, then answer to your first question will be zero.

Let the DNA be an N unit closed chain.

Part (a) $$P(sequence\ occur\ somewhere)=1-P(sequence\ does\ not\ occur\ anywhere)$$ Now, \begin{align}P(sequence\ does\ not\ occur\ at\ position\ i)&=1-P(sequence\ occur\ at\ position\ i)\\ P(sequence\ does\ not\ occur\ at\ position\ i)&=1-\frac{1}{4^6}\end{align} Multiplying the probability that sequence does not occur at any position 1, 2, 3, ... N we get, $$P(sequence\ does\ not\ occur\ anywhere)=\left(1-\frac{1}{4^6}\right)^N$$ Therefore, $$P(sequence\ occur\ somewhere)=1-\left(1-\frac{1}{4^6}\right)^N$$

For Part (b) replace $4^6$ with $4^k$ in the above.