You established the indeterminate form $0/0$; from this you cannot casually conclude $p\to\infty$. In the last line of my comment that you cite, I used the trick $e^{W(t)}=t/W(t)$ in order to simplify the expression - but if you want to look at it asymptotically, you can also analyze the prior form: $$p(y)=\exp(-\alpha+W(e^{\alpha}\beta)),$$ where $\alpha=1-\lambda-\gamma y^2$ and $\beta=\eta e^{-y^2/2}/\sqrt{2\pi}$. Now if $\gamma\in[-1/2,0)$, we get the form $p=e^{-\infty+0}$, so we find that $p\to0$. For $\gamma<-1/2$, you'll have to find a way to show
$$W(ae^{-(1/2+\gamma)y^2})+\gamma y^2\to-\infty \text{ as } |y|\to\infty.$$
Above $a>0$ is arbitrary (and $\eta e^{1-\lambda}/\sqrt{2\pi}$ in particular). Note $W(x)>1$ for $x>e$, so we have $$e^W\le W e^W =x$$
hence $W\le\log x$ for sufficiently large $x$, which makes the above $-O(y^2)\to-\infty$ (allowing notational sloppiness), proving $p\to0$ as $|y|\to\infty$ whenever we have $\gamma<0$.
This only addresses your claim that $p$ doesn't vanish at the extremes. For your broader questions concerning maximum entropy, KL divergence, or the original optimization problem, I really have no idea.
This is only an answer to your first question.
How can they replace only one entry with q and say that this is entropy of q?
In the paper $h(q)$ is not computed this way. The inequality of Lemma 4.2 is used to prove that $h(p) \le log(n)$ and
$h(p) \lt log(n)$ if $p$ is not the uniform distribution with $p_1=p_2=\ldots p_n=\frac{1}{n}$
Lemma 4.2:
$$-\sum_{i=1}^{n}p_i \log{p_i} \le -\sum_{i=1}^{n}p_i \log{q_i} \tag{1} $$
Equality holds iff $$p_i=q_i, i=1,\ldots , n \tag{2}$$
$\square$
We know that the entropy is defined by
$$h(p)=-\sum_{i=1}^{n}p_i \log{p_i} \tag{3} $$
This can be used to reformulate the inequation of the Lemma as
$$ h(p)\le -\sum_{i=1}^{n}p_i \log{q_i} \tag{4} $$
This is valid for all discrete distributions so also for the uniform distribution with
$$q_i=\frac{1}{n} ,i=1,\ldots,n \tag{4a} $$
Substituting $\frac{1}{n}$ for $q_i$ gives
$$ h(p)\le \sum_{i=1}^{n}p_i \log{n} = (\log{n}) \cdot \sum_{i=1}^{n}p_i = \log{n} \tag{5} $$
But $log{(n)}$ is also $h(q)$, if $q$ is the uniform distribution. This can checked simply by using the definition of the entropy:
$$h(q)=-\sum_{i=1}^{n}q_i \log{q_i}=-\sum_{i=1}^{n}\frac{1}{n} \log{\frac{1}{n}} = \log{n} \sum_{i=1}^{n}\frac{1}{n} = \log{n} \tag{6} $$
So it follows that for the uniform distribution $q$
$$h(p) \le \log{n} = h(q) \tag{7} $$
Because of $(6)$ and $(2)$ equality holds exactly if $p$ is the uniform distribution too.
Edit:
Theorem 5.1 states, that the continous probability density on [a,b] with $\mu = \frac{a+b}{2}$ that maximizes entropy is uniform distribution $q(x)=\frac{1}{b-a}, x \in [a,b]$. This complies with the principle of indifference for coninous variable found here.
On the whole real line there is no uniform probability density. On the whole real line there is also no continous probability density with highest entropy, because there are continous probability densities with arbitrary high entropies, e.g. the gaussian distribution has entropy $\frac{1}{2}(1+\log(2 \pi \sigma^2))$: if we increase $\sigma$ the entropy increases.
Because there is no maximal entropy for continuous densities over $R$ we must have additional constraints, e.g. the constraint that $\sigma$ is fixed and that $\mu$ is fixed. The fact that there is a given finite $\sigma^2$ and $\mu$ for me makes intuitively clear that there values nearer to $\mu$ must have higher probability. If you don't fix $\mu$ then you will get no unique solution.The Gaussian distribution for each real $\mu$ is a solution: this is some kind of "uniformness", all $\mu$ can be used for a solution.
Notice that it is crucial to fix $\sigma$, $\mu$ and to demand $p(x)>0 , \forall x \in R$. If you fix other values or change the form $R$ to another domain for the density funtion , e.g. $R^+$, you will get other solution: the exponential distribution, the truncated exponential distribution, the laplace distribution, the lognorma distribution (Theorems 3.3, 5.1, 5.2, 5.3)
Best Answer
I believe the second paper you cited (by Harremoës) is actually the answer you're looking for. The Poisson distribution describes the number of occurrences of an event in a fixed interval, under the assumption that occurrences are independent. In particular, the constraint that the events should be independent means that not every discrete distribution is a valid candidate for describing this system, and motivates the choice of the union of infinite Bernoulli variables. Then, Harremoës shows that if you further constrain the expected value (i.e., $\lambda$), then the maximum entropy distribution is the Poisson distribution.
So, the Poisson distribution is the maximum entropy distribution given constraints of counting independent events and having a known expected value.
That said, you can also easily reverse-engineer a (contrived) constraint for which the Poisson distribution would be the maximum entropy distribution.
Let our unknown constraint be $\mathbb{E}[f(k)] = c$. Maximizing the entropy with this constraint, along with the mean being $\lambda$, gives the minimization problem
$\sum_k p(k) \ln p(k) - \alpha \left( \sum_k p(k) - 1\right) - \beta\left(\sum_k k p(k) - \lambda\right) - \gamma \left( \sum_k p(k)f(k) - c \right)$,
where $\alpha$, $\beta$, and $\gamma$ are Lagrange multipliers. Taking the derivative with respect to $p(k)$ yields
$\ln p(k) = -1 + \alpha + \beta k + \gamma f(k)$,
We already know the Poisson distribution has the form $p(k) = e^{-\lambda}\lambda^k/k!$, or $\ln(p(k)) = -\lambda + k \ln(\lambda) - \ln(k!)$. Therefore, we can guess that $f(k)$ has the functional form $\ln(k!)$.
So, the Poisson distribution maximizes entropy when $p$ has mean $\lambda$ and $\mathbb{E}(\ln k!) = $[some particular value depending on $\lambda$].
This approach may not be very satisfying, since it's not clear why we would want a distribution with a specified expectation value of $\ln k!$. The Johnson paper you cited is (in my opinion) similarly unsatisfying, since it essentially proves that the Poisson distribution is the maximal entropy distribution among distributions which are "more log-convex than the Poisson distribution".