This is only an answer to your first question.
How can they replace only one entry with q and say that this is entropy of q?
In the paper $h(q)$ is not computed this way. The inequality of Lemma 4.2 is used to prove that $h(p) \le log(n)$ and
$h(p) \lt log(n)$ if $p$ is not the uniform distribution with $p_1=p_2=\ldots p_n=\frac{1}{n}$
Lemma 4.2:
$$-\sum_{i=1}^{n}p_i \log{p_i} \le -\sum_{i=1}^{n}p_i \log{q_i} \tag{1} $$
Equality holds iff $$p_i=q_i, i=1,\ldots , n \tag{2}$$
$\square$
We know that the entropy is defined by
$$h(p)=-\sum_{i=1}^{n}p_i \log{p_i} \tag{3} $$
This can be used to reformulate the inequation of the Lemma as
$$ h(p)\le -\sum_{i=1}^{n}p_i \log{q_i} \tag{4} $$
This is valid for all discrete distributions so also for the uniform distribution with
$$q_i=\frac{1}{n} ,i=1,\ldots,n \tag{4a} $$
Substituting $\frac{1}{n}$ for $q_i$ gives
$$ h(p)\le \sum_{i=1}^{n}p_i \log{n} = (\log{n}) \cdot \sum_{i=1}^{n}p_i = \log{n} \tag{5} $$
But $log{(n)}$ is also $h(q)$, if $q$ is the uniform distribution. This can checked simply by using the definition of the entropy:
$$h(q)=-\sum_{i=1}^{n}q_i \log{q_i}=-\sum_{i=1}^{n}\frac{1}{n} \log{\frac{1}{n}} = \log{n} \sum_{i=1}^{n}\frac{1}{n} = \log{n} \tag{6} $$
So it follows that for the uniform distribution $q$
$$h(p) \le \log{n} = h(q) \tag{7} $$
Because of $(6)$ and $(2)$ equality holds exactly if $p$ is the uniform distribution too.
Edit:
Theorem 5.1 states, that the continous probability density on [a,b] with $\mu = \frac{a+b}{2}$ that maximizes entropy is uniform distribution $q(x)=\frac{1}{b-a}, x \in [a,b]$. This complies with the principle of indifference for coninous variable found here.
On the whole real line there is no uniform probability density. On the whole real line there is also no continous probability density with highest entropy, because there are continous probability densities with arbitrary high entropies, e.g. the gaussian distribution has entropy $\frac{1}{2}(1+\log(2 \pi \sigma^2))$: if we increase $\sigma$ the entropy increases.
Because there is no maximal entropy for continuous densities over $R$ we must have additional constraints, e.g. the constraint that $\sigma$ is fixed and that $\mu$ is fixed. The fact that there is a given finite $\sigma^2$ and $\mu$ for me makes intuitively clear that there values nearer to $\mu$ must have higher probability. If you don't fix $\mu$ then you will get no unique solution.The Gaussian distribution for each real $\mu$ is a solution: this is some kind of "uniformness", all $\mu$ can be used for a solution.
Notice that it is crucial to fix $\sigma$, $\mu$ and to demand $p(x)>0 , \forall x \in R$. If you fix other values or change the form $R$ to another domain for the density funtion , e.g. $R^+$, you will get other solution: the exponential distribution, the truncated exponential distribution, the laplace distribution, the lognorma distribution (Theorems 3.3, 5.1, 5.2, 5.3)
You have to figure out $F_X(x)$, the CDF of $X$ (similarly for $Y$, etc.)
for all real numbers $x$. One simple calculation is to find the
maximum possible value $x_{\max}$ of $X$ and the minimum possible
value $x_{\min}$ of $X$ and set $F_X(x) = 1$ for all $x \geq x_{\max}$
and $F_x(x) = 0$ for all $x < x_{\min}$.
Next, choose your favorite real number $x \in (x_{\min}, x_{\max})$ and
write
$$F_X(x) = P\{X\leq x\} = P\{U \in A\}$$ where
$A$ is a set of real numbers that you need to figure out all by yourself.
Remember that $A$ has the property that $X \leq x$ exactly when $U \in A$,
and of course, $A$ will depend on the choice of $x$.
Now, compute $P\{U \in A\}$ using the known pdf of $U$ and your knowledge
of the set $A$. Integration might be required for this.
Repeat for your next favorite real number, and then the next most
favorite, and so on. After a while, you might have an "Aha!" moment
where you realize that for all real numbers $x$ in some interval
$(\alpha, \beta)$, you will find that $F_X(x) = \gamma(x)$ for some function
$\gamma(\cdot)$. Now pick a number in $(x_{\min}, x_{\max}) - (\alpha,\beta)$
and keep going. You will thus come up with a complete description of
the function $F_X(x)$. Warning: It is only in very rare cases that
$F_X(x)$ can be expressed by a single "formula" valid for all $x$.
Now, differentiate $F_X(x)$ to find the density $f_X(x)$.
Repeat all this for the other variables.
Trust me: it gets easier with practice. But you got to do it yourself,
and struggle with finding sets $A$ and checking to make sure you have
accounted for all $x$, etc; just blindly copying down what your instructor or
TA writes on the blackboard or the leader of your "study group" writes
on his homework solutions will not work. Learning probability theory (indeed,
learning any branch of mathematics) is not a spectator sport; you have to
struggle with it yourself.
Best Answer
The pdf of $Y$ is obtained by taking the joint pdf of $(X,Y)$ and marginalizing $X$ out. That is:
$$f_Y(y)=\int_{-\infty}^\infty f_{X,Y}(x,y) dx.$$
The joint pdf of $(X,Y)$ is the product of the conditional pdf $f_{Y|X}(y|x)$ and the pdf of $X$, $f_X$. (If this seems weird to you, it is basically analogous to the familiar identity $P(A \cap B)=P(A \mid B) P(B)$.) That is:
$$f_{X,Y}(x,y)=f_{Y|X}(y|x) f_X(x).$$
You have these two pdfs, so with this and some calculus you can do part 1. Once you have the joint pdf you can compute the covariance with some more calculus, so you can do part 2.
The one thing you seem to be having trouble reading is the conditional pdf. The problem is trying to tell you that $f_{Y|X}(y|x)$, for each fixed $x$, is the pdf of a normal r.v. with mean $x$ and variance $1$.