I cannot speak as to the use of these symbols but let me show you instead the traditional way, why the mle is biased.
Recall that the exponential distribution is a special case of the General Gamma distribution with two parameters, shape $a$ and rate $b$. The pdf of a Gamma Random Variable is:
$$f_Y (y)= \frac{1}{\Gamma(a) b^a} y^{a-1} e^{-y/b}, \ 0<y<\infty$$
where $\Gamma (.)$ is the gamma function. Alternative parameterisations exist, see for example the wikipedia page.
If you put $a=1$ and $b=1/\lambda$ you arrive at the pdf of the exponential distribution:
$$f_Y(y)=\lambda e^{-\lambda y},0<y<\infty$$
One of the most important properties of a gamma RV is the additivity property, simply put that means that if $X$ is a $\Gamma(a,b)$ RV, $\sum_{i=1}^n X_i$ is also a Gamma RV with $a^{*}=\sum a_i$ and $b^{*}=b$ as before.
Define $Y=\sum X_i$ and as noted above $Y$ is also a Gamma RV with shape parameter equal to $n$, $\sum_{i=1}^n 1 $, that is and rate parameter $1/\lambda$ as $X$ above. Now take the expectation $E[Y^{-1}]$
$$ E\left [ Y^{-1} \right]=\int_0^{\infty}\frac{y^{-1}y^{n-1}\lambda^n}{\Gamma(n)}\times e^{-\lambda y}dy=\int_0^{\infty}\frac{y^{n-2}\lambda^n}{\Gamma(n)}\times e^{-\lambda y}dy$$
Comparing the latter integral with an integral of a Gamma distribution with shape parameter $n-1$ and rate one $1/\lambda$ and using the fact that $\Gamma(n)=(n-1) \times \Gamma(n-1)$ we see that it equals $\frac{\lambda}{n-1}$. Thus
$$E\left[ \hat{\theta} \right]=E\left[ \frac{n}{Y} \right]=n \times E\left[Y^{-1}\right]=\frac{n}{n-1} \lambda$$
which clearly shows that the mle is biased. Note, however, that the mle is consistent. We also know that under some regularity conditions, the mle is asymptotically efficient and normally distributed, with mean the true parameter $\theta$ and variance $\{nI(\theta) \}^{-1} $. It is therefore an optimal estimator.
Does that help?
You actually don't have to show that it is a maximum in this case.
The root of the first derivative of the log-likelihood, that is the MLE, can be shown to be unique if the iid observation are considered from a random variable of the exponential family. That is, a r.v. for which the density has the form:
$f(x;\theta) = h(x)\,\exp{(s\,\theta - K(\theta) )}$
Where $h$ is a function only of the observations $x_i$, $\theta$ is called the natural parameter, $s$ is called the natural statistics and $K$ is a function only of the natural parameter.
In this case, given $X_1,..X_n$ iid with the density you have, we have
$\prod_{i=1}^n \frac{(x_i+1)}{\theta\,(1+\theta)} \exp(-x_i/\theta) \longrightarrow \prod_i (x_i+1) exp\big(-\frac{n\overline{X}}{\theta} - \log(\theta(1+\theta)) \big)$
As this random variable belongs to the exponential family the MLE is unique.
In general, for exponential families the MLE always exists, is unique, is consistent and is asymptotically normal.
EDIT: http://www.stat.purdue.edu/~dasgupta/ml.pdf is a good explanation of this, maybe a bit too mathematical, but it depends from your academic background. Otherwise, V.S.Huzurbazar, in his "The likelihood equation, consistency and the maxima of the likelihood function" (1947) explains this theory in an easier way.
Best Answer
Under the standard regularity conditions, the Fisher information can be expressed as the negative of the expected value of the second derivative of the log-likelihood, and NOT of some transformation of the log likelihood.
Your log-likelihood is $L=n\ln\beta + (-\beta-1)\sum \ln(x_i)$, and that doesn't change. For whatever reasons you used the monotonic transfomartion $\tilde L=\frac 1n L$, in calculating the MLE. Since it is a monotonic transformation, naturally the MLE is the same. Now calculate the Fisher information. It is, irrespective of whether you used $L$ or $\tilde L$ in the maximization procedure,
$$I(\theta) = -E\left[\frac {\partial^2}{\partial \beta^2} L\right]=E\left[\frac n{\beta^2}\right]$$
You cannot substitute $\tilde L$ for $L$ in the calculation of the Fisher information as if $\tilde L$ was equal to $L$ - it is not (the fact that $\tilde L$, being a monotonic transformation of $L$ leads to the same MLE does not make it equal to $L$).
Another way to look at it is to remember that the likelihood w.r.t $\beta$ function is (and should be) also a joint density w.r.t to the $x_i$'s function. In our case (assuming an i.i.d sample) it is
$$f(X;\beta) = \prod_{i=1}^n\beta x_i^{-\beta-1}, \qquad f(x_i;\beta) = \frac {\beta} {x_i^{\beta+1}}$$ which is a Pareto distribution with minimum value $1$.
Now, could the transformed log-likelihood lead to a joint density? It would give
$$\left(\prod_{i=1}^n\beta x_i^{-\beta-1}\right)^{1/n}$$
This is the geometric mean of the product of the $n$ marginal densities. Can it represent the joint density of a collection of i.i.d random variables?