Please explain the following for me:
Example generated using online calculator
sequence = aaabbcddddefgghhijjk
sequence length = 20
unique characters in sequence = 11
frequencies of unique characters:
a = 0.15, b = 0.1, c = 0.05, d = 0.2, e = 0.05, f = 0.05, g = 0.1, h = 0.1, i = 0.05, j = 0.1, k = 0.05
for which we get entropy as:
$$
H(X) = -[(0.15log_2 0.15)+(0.1log_20.1)+(0.05log_20.05)+(0.2log_20.2)+
(0.05log_20.05)+(0.05log_220.05)+(0.1log_20.1)+(0.1log_20.1)+(0.05log_20.05)+ (0.1log_20.1)+(0.05log_20.05)]
$$
$$
H(X) = -[(-0.411)+(-0.332)+(-0.216)+(-0.464)+(-0.216)+(-0.216)+(-0.332)+(-0.332)+(-0.216)+ (-0.332)+(-0.216)]
$$
$$
H(X) = -[-3.28418]
$$
$$
H(X) = 3.28418
$$
If the metric entropy is the ratio of H(X)/sequence length:
$$
Metric entropy = \frac{3.28418}{20} = 0.16421
$$
What is the ratio of entropy and the number of unique characters? In the case of this example:
$$
\frac{3.28418}{11} = 0.2986
$$
Could this be considered relative entropy?
Best Answer
In case those who voted this question would like an answer, here's what I've learned since posting it:
The
relative entropy
I was asking about is what's commonly referred to asnormalised entropy
as the term "relative entropy" is also used for Kullback–Leibler divergence.Normalised entropy is the ratio between observed entropy and the theoretical maximum entropy for a given system. So to normalise observed entropy, we first need to calculate maximum entropy for the given set of unique characters in the example as follows: $$ H_{max} = log_2(11)$$ $$ = 3.45943$$
Now we get normalised entropy as: $$ \frac{3.28418}{3.45943} = 0.94934 $$
This is the randomness in the sequence generated relative to the number of unique characters made available. If we were interested in the randomness of this sequence relative to all lower case English alphabet, we would get:
$$ \frac {3.28418}{log_2(26)}$$ $$ = \frac{3.28418}{4.70044}$$ $$ = 0.69869$$