This is only an answer to your first question.
How can they replace only one entry with q and say that this is entropy of q?
In the paper $h(q)$ is not computed this way. The inequality of Lemma 4.2 is used to prove that $h(p) \le log(n)$ and
$h(p) \lt log(n)$ if $p$ is not the uniform distribution with $p_1=p_2=\ldots p_n=\frac{1}{n}$
Lemma 4.2:
$$-\sum_{i=1}^{n}p_i \log{p_i} \le -\sum_{i=1}^{n}p_i \log{q_i} \tag{1} $$
Equality holds iff $$p_i=q_i, i=1,\ldots , n \tag{2}$$
$\square$
We know that the entropy is defined by
$$h(p)=-\sum_{i=1}^{n}p_i \log{p_i} \tag{3} $$
This can be used to reformulate the inequation of the Lemma as
$$ h(p)\le -\sum_{i=1}^{n}p_i \log{q_i} \tag{4} $$
This is valid for all discrete distributions so also for the uniform distribution with
$$q_i=\frac{1}{n} ,i=1,\ldots,n \tag{4a} $$
Substituting $\frac{1}{n}$ for $q_i$ gives
$$ h(p)\le \sum_{i=1}^{n}p_i \log{n} = (\log{n}) \cdot \sum_{i=1}^{n}p_i = \log{n} \tag{5} $$
But $log{(n)}$ is also $h(q)$, if $q$ is the uniform distribution. This can checked simply by using the definition of the entropy:
$$h(q)=-\sum_{i=1}^{n}q_i \log{q_i}=-\sum_{i=1}^{n}\frac{1}{n} \log{\frac{1}{n}} = \log{n} \sum_{i=1}^{n}\frac{1}{n} = \log{n} \tag{6} $$
So it follows that for the uniform distribution $q$
$$h(p) \le \log{n} = h(q) \tag{7} $$
Because of $(6)$ and $(2)$ equality holds exactly if $p$ is the uniform distribution too.
Edit:
Theorem 5.1 states, that the continous probability density on [a,b] with $\mu = \frac{a+b}{2}$ that maximizes entropy is uniform distribution $q(x)=\frac{1}{b-a}, x \in [a,b]$. This complies with the principle of indifference for coninous variable found here.
On the whole real line there is no uniform probability density. On the whole real line there is also no continous probability density with highest entropy, because there are continous probability densities with arbitrary high entropies, e.g. the gaussian distribution has entropy $\frac{1}{2}(1+\log(2 \pi \sigma^2))$: if we increase $\sigma$ the entropy increases.
Because there is no maximal entropy for continuous densities over $R$ we must have additional constraints, e.g. the constraint that $\sigma$ is fixed and that $\mu$ is fixed. The fact that there is a given finite $\sigma^2$ and $\mu$ for me makes intuitively clear that there values nearer to $\mu$ must have higher probability. If you don't fix $\mu$ then you will get no unique solution.The Gaussian distribution for each real $\mu$ is a solution: this is some kind of "uniformness", all $\mu$ can be used for a solution.
Notice that it is crucial to fix $\sigma$, $\mu$ and to demand $p(x)>0 , \forall x \in R$. If you fix other values or change the form $R$ to another domain for the density funtion , e.g. $R^+$, you will get other solution: the exponential distribution, the truncated exponential distribution, the laplace distribution, the lognorma distribution (Theorems 3.3, 5.1, 5.2, 5.3)
$\{U_i\}_{i=1}^n$ n iid uniform distributed variables, and $Z=\sum_{i=1}^{n}{U_i}$
The moment generating function of $Z$ is $$\phi_Z(u)=E(e^{uZ})$$
$$\phi_Z(u)=E(e^{u\sum_{i=1}^{n}{U_i}})$$
$$=E(\prod_{i=1}^{n}e^{u{U_i}})$$
By independence , we have
$$E(\prod_{i=1}^{n}e^{u{U_i}})=\prod_{i=1}^{n}E(e^{u{U_i}})$$
Because the $U_i$'s have the same distribution , we have
$$\phi_Z(u)=E(e^{u{U_1}})^n$$ or
$$\phi_Z(u)=\phi_{U_1}(u)^n$$ where $\phi_{U_1}$ is the moment generating function of a uniform variable, which is defined by
$$\phi_{U_1}(u)=\frac{e^u-1}{u}$$ if $u$ non nil, $1$ otherwise
Best Answer
The best and simpler way to derive the distribution of the sum of two independent Uniform random variates is geometrical requiring working out some area calculations in 2-D.
However, the derivation of the distribution of $\sum_{i=1}^{n}X_i$ for $n>2$ with each $X_i$ independently distributed as $U(0,1)$ is generally tedious. Geometrically it is difficult to visualise for higher values of $n$. However, a convolution approach would be used to find them (that I will use here kind of recursively).
I start assuming you know the distribution of $A=X_1+X_2$ given by the below pdf: $$f_A(a) = \begin{cases} a & \text{if $0 \le a \le 1$}\\ 2-a & \text{if $1 \le a \le 2$}\\ 0 & \text{elsewhere}\end{cases}$$
For $n=3$, define $B=X_1+X_2+X_3=A+X_3$. Note $0\le B\le3$. Now, by convolution of pdfs, the pdf of B: $f_B(b)=\int_{-\infty}^\infty f_{X_3}(x_3)f_A(b-x_3)\,dx_3$
Note-1: $f_A(b-x_3)=b-x_3$ for $0\le b-x_3\le1$, i.e., $b-1\le x_3\le b$; Also, $0 \le x_3 \le 1$. Combining these two gives $\mathbb{max}(b-1,0) \le x_3\le \mathbb{min}(b,1)$
Note-2: $f_A(b-x_3)=2-b+x_3$ for $1\le b-x_3\le2$, i.e., $b-2\le x_3\le b-1$; Also, $0 \le x_3 \le 1$. Combining these two gives $\mathbb{max}(b-2,0) \le x_3\le \mathbb{min}(b-1,1)$
Looking at the bounds of $x_3$, it is reasonable to break the range of $b$ (i.e.,[$0,3$]) into $0\le b\le1$, $1\le b\le2$ and $2\le b\le3$ and considering the form of the pdf of B within each range separately.
Case: $0\le b\le1$: Range in Note-1 reduces to $0\le x_3\le b$; while that in Note-2 doesn't reduce to a feasible bound for $x_3$. Thus the pdf of $B$ reduces to
$f_B(b)=\int_0^b (b-x_3)\,dx_3=\frac{b^2}{2}$
Case: $1\le b\le2$: Range in Note-1 reduces to $b-1\le x_3\le 1$; while that in Note-2 reduces to $0\le x_3\le b-1$. Thus the pdf of $B$ reduces to
$f_B(b)=\int_{b-1}^1 (b-x_3)\,dx_3+\int_0^{b-1}(2-b+x_3)\,dx_3=\frac{-2b^2+6b-3}{2}$
Case: $2\le b\le3$: Range in Note-1 doesn't reduce to a feasible bound for $x_3$; while that in Note-2 reduces to $b-2\le x_3\le 1$. Thus the pdf of $B$ reduces to
$f_B(b)=\int_{b-2}^1(2-b+x_3)\,dx_3=\frac{(3-b)^2}{2}$
Can you now try similar logic for $n=4$ and $n=5$?
Although it may not be readily intuitive that the general form of the pdf of the sum of Uniform variates would tend to follow approximately Normal for large n, the pdf is a piecewise polynomial function of degree n-1. And, if you plot this function you'll end up with a plot of close to a pdf of Normal distribution (I tried in R) and it indeed goes to show how powerful the CLT is!