I believe the second paper you cited (by Harremoës) is actually the answer you're looking for. The Poisson distribution describes the number of occurrences of an event in a fixed interval, under the assumption that occurrences are independent. In particular, the constraint that the events should be independent means that not every discrete distribution is a valid candidate for describing this system, and motivates the choice of the union of infinite Bernoulli variables. Then, Harremoës shows that if you further constrain the expected value (i.e., $\lambda$), then the maximum entropy distribution is the Poisson distribution.
So, the Poisson distribution is the maximum entropy distribution given constraints of counting independent events and having a known expected value.
That said, you can also easily reverse-engineer a (contrived) constraint for which the Poisson distribution would be the maximum entropy distribution.
Let our unknown constraint be $\mathbb{E}[f(k)] = c$. Maximizing the entropy with this constraint, along with the mean being $\lambda$, gives the minimization problem
$\sum_k p(k) \ln p(k) - \alpha \left( \sum_k p(k) - 1\right) - \beta\left(\sum_k k p(k) - \lambda\right) - \gamma \left( \sum_k p(k)f(k) - c \right)$,
where $\alpha$, $\beta$, and $\gamma$ are Lagrange multipliers. Taking the derivative with respect to $p(k)$ yields
$\ln p(k) = -1 + \alpha + \beta k + \gamma f(k)$,
We already know the Poisson distribution has the form $p(k) = e^{-\lambda}\lambda^k/k!$, or $\ln(p(k)) = -\lambda + k \ln(\lambda) - \ln(k!)$. Therefore, we can guess that $f(k)$ has the functional form $\ln(k!)$.
So, the Poisson distribution maximizes entropy when $p$ has mean $\lambda$ and $\mathbb{E}(\ln k!) = $[some particular value depending on $\lambda$].
This approach may not be very satisfying, since it's not clear why we would want a distribution with a specified expectation value of $\ln k!$. The Johnson paper you cited is (in my opinion) similarly unsatisfying, since it essentially proves that the Poisson distribution is the maximal entropy distribution among distributions which are "more log-convex than the Poisson distribution".
Note that the problem of finding the most efficient series of questions with "yes-no" answers in order to determine the outcome of $X$ is equivalent to what is known in information theory as the source coding problem, i.e., assign to the $k$-th outcome a binary codeword (number) of size $L_k$ such that the codewords are distinguishable and have an average length $\bar{L}\triangleq \sum_k p_k L_k$ as small as possible.
The method you suggest for finding the outcome of a non-equiprobable $X$ is essentially assigning the following codewords to each outcome (ordered in decreasing probability):
$$
\begin{align}
1&\text{ }(L_1=1)\\
01&\text{ }(L_2=2)\\
001&\text{ }(L_3=3)\\
\vdots\\
00\ldots01 & \text{ }(L_n=n)
\end{align}
$$
However, this is $not$ the best we can do. This is because the most efficient question at each stage of the "game" is the one whose two outcomes are as close as possible to equiprobable, which, in general, is not accomplished by your approach. The optimal series of questions should be as follows: For the first question, partition the sample space in two sets (events) whose probabilities are as close to $0.5$ as possible and ask whether $X$ belongs to one of these sets. Once you know the set $X$ is in, repeat the procedure for the second question, this time considering only the "reduced" sample space corresponding to the known set, and so on.
One well-known source coding algorithm that follows this principle is Huffman source coding. It can be shown the for the average Huffman codeword length it holds $\bar{L} \in [H(\{p_i\}), H(\{p_{i}\})+1] $, i.e., the average number of binary questions required to determine the outcome of $X$ is close to the entropy of $X$.
Best Answer
First, your statement is confusing because your "variance" is not the variance of the random variable for which the "entropy" is computed, rather it's the mean squared dispersion of the values of the discrete probability function.
With this caveat, the answer is yes, because with the given restrictions, the distribution that maximes entropy is the uniform distribution, which has zero "variance" ... with that special meaning. But this seems rather trivial and not useful.Update: I'd missed the condition that $a_i$ must be integer. Then (allow me to use $N$ instead of your $N-1$, it feels more natural), I conjecture that the entropy is maximized by the "closest to uniform" distribution: let $u= \lfloor A/N\rfloor$, $n_u = N(u+1)-A$. Then we have $a_i = u$ for $i=1 \cdots n_u$, $a_i = u$ for $i=1 \cdots n_u$, and $a_i = u+1$ for $i=n_u+1 \cdots N$. Which also minimizes your "variance".
Added: To show that this maximizes the entropy: consider a distribution $\{a_i\}$ where $a_M - a_m >1$ ($a_M, a_m$ are values that attain the maximum and minimum). Show that the process of changing $a_M := a_M -1$, $a_m := a_m+1$ gives other valid distribution with bigger entropy (this can be shown from concavity of the entropy; see also Cover and Thomas, exercise 2.28). Show that repeting this process eventually ends in the distribution given above.