Solved – Hessian of Laplace distribution

likelihoodmathematical-statisticsmaximum likelihood

The density of the Laplace distribution is given by:

$$f(x;\mu,\sigma)=\frac{1}{2\sigma}\exp\left(-\frac{\vert x- \mu\vert}{\sigma}\right).$$

It is easy to see that this function is not differentiable at $\mu$. However, I am interested on some asymptotic normality results of the MLE $(\hat{\mu},\hat{\sigma})$, which require double-differentiability of the log likelihood function:

$${\mathcal l}(\mu,\sigma) = \sum_{j=1}^n \log f(x_j;\mu,\sigma),$$

with respect to $(\mu,\sigma)$, for a random sample $(x_1,\dots,x_n)$. What I basically need is the existence of the Hessian matrix of the log-likelihood evaluated at the MLE, which means the existence of: $\frac{\partial^2}{\partial \mu^2}{\mathcal l}(\mu,\sigma) \Big\vert_{\mu=\hat{\mu}}$, $\frac{\partial^2}{\partial \sigma^2}{\mathcal l}(\mu,\sigma) \Big\vert_{\sigma=\hat{\sigma}}$, $\frac{\partial^2}{\partial \mu\partial\sigma}{\mathcal l}(\mu,\sigma) \Big\vert_{\mu=\hat{\mu},\sigma=\hat{\sigma}}$.

Is there any reference to justify differentiability of the log-likelihood, or it is not differentiable at the MLE?

Best Answer

In Huber and Ronchetti "Robust Statistics" (2009 2nd ed), ch. 4.4 it is proven that the finiteness of the Fisher Information matrix is equivalent to the density being absolutely continuous, without needing differentiability. To achieve it, the authors start by providing a generalized definition of Fisher information in terms of the distribution function "when the classical expression does not make sense" (the "classical expression" being $I(f) = \int (f'/f)^2fdx = E(f'/f)^2$, $f$ being the density) and then prove that the generalized expression will coincide with the classical expression iff $f$ is absolutely continuous. The proof is not trivial.

In practice this means that, in order to obtain the Fisher information, if the density is absolutely continuous, we can calculate the derivatives of the log-likelihood as usual without worrying about a possible singularity looking at us.

In Kotz, S., Kozubowski, T., & Podgorski, K. (2001). The Laplace Distribution and Generalizations: A Revisit With Applications to Communications, Exonomics, Engineering, and Finance. Springer., ch.2.6 the maximum likelihood properties are proven (they also note the issue of non-differentiability and reference the previous book).

Regarding actual computation of the Hessian during estimation to cover the case that the MLE is actually exactly equal to an observation, we have to supplement the computer algorithm with a condition of setting a value "close" to zero for the specific component.

Related Question