Heuristically, the probability density function on $\{x_1, x_2,..,.x_n\}$ with maximum entropy turns out to be the one that corresponds to the least amount of knowledge of $\{x_1, x_2,..,.x_n\}$, in other words the Uniform distribution.
Now, for a more formal proof consider the following:
A probability density function on $\{x_1, x_2,..,.x_n\}$ is a set of nonnegative real numbers $p_1,...,p_n$ that add up to 1. Entropy is a continuous function of the $n$-tuples $(p_1,...,p_n)$, and these points lie in a compact subset of $\mathbb{R}^n$, so there is an $n$-tuple where entropy is maximized. We want to show this occurs at $(1/n,...,1/n)$ and nowhere else.
Suppose the $p_j$ are not all equal, say $p_1 < p_2$. (Clearly $n\neq 1$.) We will find a new probability density with higher entropy. It then follows, since entropy is maximized at
some $n$-tuple, that entropy is uniquely maximized at the $n$-tuple with $p_i = 1/n$ for all $i$.
Since $p_1 < p_2$, for small positive $\varepsilon$ we have $p_1 + \varepsilon < p_2 -\varepsilon$. The entropy of $\{p_1 + \varepsilon, p_2 -\varepsilon,p_3,...,p_n\}$ minus the entropy of $\{p_1,p_2,p_3,...,p_n\}$ equals
$$-p_1\log\left(\frac{p_1+\varepsilon}{p_1}\right)-\varepsilon\log(p_1+\varepsilon)-p_2\log\left(\frac{p_2-\varepsilon}{p_2}\right)+\varepsilon\log(p_2-\varepsilon)$$
To complete the proof, we want to show this is positive for small enough $\varepsilon$. Rewrite the above equation as
$$-p_1\log\left(1+\frac{\varepsilon}{p_1}\right)-\varepsilon\left(\log p_1+\log\left(1+\frac{\varepsilon}{p_1}\right)\right)-p_2\log\left(1-\frac{\varepsilon}{p_2}\right)+\varepsilon\left(\log p_2+\log\left(1-\frac{\varepsilon}{p_2}\right)\right)$$
Recalling that $\log(1 + x) = x + O(x^2)$ for small $x$, the above equation is
$$-\varepsilon-\varepsilon\log p_1 + \varepsilon + \varepsilon \log p_2 + O(\varepsilon^2) = \varepsilon\log(p_2/p_1) + O(\varepsilon^2)$$
which is positive when $\varepsilon$ is small enough since $p_1 < p_2$.
A less rigorous proof is the following:
Consider first the following Lemma:
Let $p(x)$ and $q(x)$ be continuous probability density functions on an interval
$I$ in the real numbers, with $p\geq 0$ and $q > 0$ on $I$. We have
$$-\int_I p\log p dx\leq -\int_I p\log q dx$$
if both integrals exist. Moreover, there is equality if and only if $p(x) = q(x)$ for all $x$.
Now, let $p$ be any probability density function on $\{x_1,...,x_n\}$, with $p_i = p(x_i)$. Letting $q_i = 1/n$ for all $i$,
$$-\sum_{i=1}^n p_i\log q_i = \sum_{i=1}^n p_i \log n=\log n$$
which is the entropy of $q$. Therefore our Lemma says $h(p)\leq h(q)$, with equality if and only if $p$ is uniform.
Also, wikipedia has a brief discussion on this as well: wiki
Best Answer
For natural log the units are called "nats". I believe it's just a convention to define entropy with natural log and it probably stems from thermodynamic entropy which uses nats for convenience: as wiki puts it "Physical systems of natural units that normalize Boltzmann's constant to 1 are effectively measuring thermodynamic entropy in nats".
As the main concern about entropy is its role in definition of mutual information between random variables, there's no practical effect of using different bases for log.