Given a channel output $Y_i=y_i$, we compute $P_{X_i|Y_i}(x_i|y_i)$ for $x_i=0$ and $x_i=1$ via Bayes' rule, i.e.,
$$
P_{X_i|Y_i}(x_i|y_i)= \frac{P_{Y_i|X_i}(y_i|x_i)P_{X_i}(x_i)}{P_{Y_i}(y_i)}.
$$
Since $P_{X_i}(0)=P_{X_i}(1)=\frac{1}{2}$ (the common assumption in this context), we have that
$$P_{X_i|Y_i}(x_i|y_i)=c\cdot P_{Y_i|X_i}(y_i|x_i)$$
for $x_i=0, 1$, where $c$ is some constant.
Since $P_{X_i|Y_i}(0|y_i)+P_{X_i|Y_i}(1|y_i)=1$ and the likelihood $P_{Y_i|X_i}(y_i|x_i)$ is specified by the channel (you can see more details in the last part of my reply), we can find $c$ without computing $P_{Y_i}(y_i)$.
However, a more smarter choice of $c$ is
$$c=(P_{Y_i|X_i}(y_i|0)+P_{Y_i|X_i}(y_i|1))^{-1}.$$
From here, you should be able to sense why the posterior probability $P_{X_i|Y_i}(x_i|y_i)$ is called the normalized likelihood. To end the first part, I would like to point out this fact:
$$
\frac{P_{X_i|Y_i}(0|y_i)}{P_{X_i|Y_i}(1|y_i)}=\frac{P_{Y_i|X_i}(y_i|0)}{P_{Y_i|X_i}(y_i|1)}.
$$
Regarding your second question:
The author also writes: "from the point of view of decoding, all that matters is >the likelihood ratio". While this seems intuitively plausible to me, is there a >simple way to see that this is true in general from the formula of Bayes' theorem?
Looking at the decoding problem, our goal is to find a codeword $\mathbf{x}$ in your codebook $\mathcal{C}$ such that $P_{\mathbf{X}|\mathbf{Y}}(\mathbf{x}|\mathbf{y})$ is maximized. For simplicity, I consider memoryless binary-input channels. As codeword are equally likely (by assumption), the decoding task for any given channel output $\mathbf{y}$ is
$$
\arg\max_{\mathbf{x}\in\mathcal{C}}P_{\mathbf{X}|\mathbf{Y}}(\mathbf{x}|\mathbf{y})
=\arg\max_{\mathbf{x}\in\mathcal{C}}P_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x})=\arg\max_{\mathbf{x}\in\mathcal{C}}\prod_{i=1}^n P_{Y_i|X_i}(y_i|x_i).
$$
To solve the above optimization problem, we may compare $P_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x})$ for all pairs of codewords $\mathbf{x}_1$ and $\mathbf{x}_2$: we know $\mathbf{x}_1$ is more likely to be transmitted if
$$
\prod_{i=1}^n\frac{P_{Y_i|X_{1,i}}(y_i|x_{1,i})}{P_{Y_i|X_{2,i}}(y_i|x_{2,i})}\ge 1.
$$
From this comparison, you should realize why the decoding only relies on the likelihood ratios.
Your last question should be no longer a question. Indeed, this equality $$P_{Y_i|X_i}(y_i|0)+P_{Y_i|X_i}(y_i|1)=1$$ is in general not true, and we really don't mean this (I know the name "normalized likelihood" might confuse you). So, forget about this equality.
Next, how do we compute the likelihood $P_{Y_i|X_i}(y_i|x_i)$? Well, it is given to you once the channel model is determined since $P_{Y_i|X_i}(y_i|x_i)$ is exactly the channel transition probability.
For a given channel output $Y_i=y_i$ of an AWGNC where $Y_i=X_i+N_i$ and $N_i\sim\mathcal{N}(0, \sigma^2)$, we have that $P_{Y_i|X_i}(y_i|x_i)=\mathcal{N}(x_i, \sigma^2)(y_i)$ for $X_i=x_i$. The other cases need to be more specific so that we can discuss further. I hope the above helps. :)
I have doubts about the proof presented in the lecture, since it seems to assume that any lost symbol can be re-transmitted and will then not be lost a second time.
No, it doesn't assume that. You send a symbol until the decoder tells you it was received ok. Hence, for a probability of channel error $p$, the number of tries (channel use) for each symbol follows a geometric distribution, and the average number of tries is $1/(1-p)$.
I've seen claims in literature saying that the feedback does not help for any channel. I'm not sure how the feedback channel is defined exactly for this claim to hold.
That holds only for memoryless channels. And "it does not help" only in the sense that the capacity is the same. This is all explained in Cover & Thomas textbook, sect 7.12.
Is the feedback channel itself also noisy?
It's assumed that the feedback channel is perfect (noiseless). If the above property is true with this assumption, then it's also true if the feedback is noisy.
I think, with added noiseless feedback, the binary symmetric channel (BSC) and the BEC should have the same capacity, since it no longer matters if the bits are lost of flipped
Not true. For a BEC, the receiver knows if a channel error occurred, for the BSC it doesn't.
Best Answer
Fibonacci codes are used as alternatives to dense codes for large textual word based data. They are in particular good choice for compressing a set of small integers, and fast decoding as well as compressed searches may be important tools for such applications.
Fibonacci codes are robust even against insertions and deletions. which means they are robust in terms of correcting errors. If the codes are to be used over a noisy communication channel, their resilience to bit insertions, deletions and to bit-flips is of high importance.
You may have to read this paper titled "Fibonacci Coding Within the Burrows-Wheeler Compression Scheme" and some of its references