Solved – Estimation with MLE and returning the score/gradient (QMLE)

maximum likelihoodoptimizationr

I am estimating a simple AR(1) process by the ML approach. I also wish to compute the Quasi MLE standard errors, which is given by the sandwich form of the Hessian and the Score (see for example the last slide here)

So, I start by just specifying the (conditional) log likelihood for the (gaussian) AR(1) process. Then I optimise this with R's optim, which returns the Hessian to me, evaluated at the MLE-estimates, which I use as my information matrix estimate, to get the standard errors of my parameters.

So far so good (I get the same results as with the stats toolbox in Matlab).

But, how do I proceed to estimate the QMLE standard errors? For that I need the estimate of the outer product of the score function (i.e. the outer product of the gradient evaluated at the MLE estimates).

I have not found any way to get an estimate (numerically) for the gradient in any of R's optimization /ML commands. Am I missing something?. Thank you

data = read.table("Data/AR.txt", header=FALSE)
y = as.vector(data$V1) # A simple vector of observations: n1, n2, ... , nT

#Conditional LH
loglik = function(theta, y) {
  T = length(y)
  L = sum (dnorm(y[2:T], 
       mean = theta[1] + theta[2]*y[1:T-1], 
       sd = theta[3], log = TRUE))
  return (-L)
}

start=c(2.5, 0.6, 3)
b = optim(start, loglik, y=y, hessian=TRUE)
I = solve(b$hessian)
se = sqrt(diag(I)) # All good. The same MLE estimates and SE's as in Matlab.

EDIT:
I could perhaps try and use the numDeriv package to get the gradient of the likelihood function (evaluated at every observation). But I am stuck on exactly how to accomplish my goal, as I don't know how to rewrite my likelihood function for that purpose…

EDIT2: NA

EDIT3: Sorry for my stupidity, the sum of outer products is of course not the same as the outer product of the sums. It seems consistent now:

sum = numeric(3)
for (t in 2:length(y)) {
  g = grad(LLi, x=b$par, y=y, t=t)
  sum = sum + outer(g,g)
}

I2 = solve(sum)
se2 = sqrt(diag(I2))

Where LLi is the Likelihood function for each observation:

LLi = function(theta, y, t) {
  L = dnorm(y[t], 
                 mean = theta[1] + theta[2]*y[t-1], 
                 sd = theta[3], log = TRUE)
  return (L)
}

Which gives me standard errors:

> se2
[1] 0.41208510 0.04256279 0.10242072

Which is reasonably identical(?) to those obtained by the Hessian:

> se
[1] 0.40621637 0.04179929 0.09874189

Any suggestions for improvements? Programming wise my approach doesnt seem that elegant. Thanks again.

Best Answer

The numDeriv package can indeed be used to compute the gradient and the hessian (if needed). In both cases the argument y of the log-likelihood is passed through the dots mechanism, using an argument with the suitable name. For a vector-valued function, the jacobian function of the same package can be used similarly.

You could also consider computing analytical derivatives rather than numerical ones.

library(numDeriv)
H <- hessian(func = loglik, x = b$par, y = y)
g <- grad(func = loglik, x = b$par, y = y)

We can compute as well the Jacobian of a function returning a vector of length $T-1$.

mlogDens <- function(theta, y) {
  T <- length(y)
  -dnorm(y[2:T], mean = theta[1] + theta[2] * y[1:(T-1)], 
                 sd = theta[3], log = TRUE)
}
## a matrix with dim (T-1, 3)
G <- jacobian(func = mlogDens, x = b$par, y = y)
## a matrix with dim (9, T-1)
GG <- apply(G, MARGIN = 1, FUN = tcrossprod)
## a vector with length 9 representing a symmetric mat.
GGsum0 <- apply(GG, MARGIN = 1, FUN = sum)
## a symmetric mat.
GGsum <- matrix(GGsum0, nrow = 3, ncol = 3)

Related Solutions

Solved – How to compute (or numerically estimate) the standard error of the MLE

If you know the gradient and Hessian of the log-likelihood, you can write quick functions in R similar to the one you need for the LL itself. If you pass the gradient, you can use (L)BFGS in R as opposed to Nelder-Mead, which should converge a bit faster. Regardless, once you have the point of convergence, you can plug the values for the point of convergence into the function for the Hessian, and the sqrt of the diagonals is your estimated error. Here is an example using the Pareto distribution for which: $$ f(x) = \frac{\alpha\theta^{\alpha}}{\left(x+\theta\right) ^{\alpha + 1}} $$

LL <- function(pars, X){
  a <- pars[[1]]
  q <- pars[[2]]
  return(-sum(a * log(q) + log(a) - (a + 1) * log (X + q)))
}

LLG <- function(pars, X){
  a <- pars[[1]]
  q <- pars[[2]]
  ga <- -sum(log(q) + 1 / a - log(X + q))
  gq <- -sum(a / q - (a + 1) / (X + q))
  Z <- c(ga, gq)
  names(Z) <- c('a', 'q')
  return(Z)
}  

LLH <- function(pars, X){
  a <- pars[[1]]
  q <- pars[[2]]
  n <- length(X)
  haa <- n / a ^ 2
  hqq <- n * a / q ^ 2 - sum((a + 1) / (X + q) ^ 2)
  haq <- hqa <- sum(1 / (X + q)) - n / q
  Z <- matrix(c(haa, hqa, haq, hqq), ncol = 2)
  rownames(Z) <- colnames(Z) <- c('a', 'q')
  return(Z)
}

I tend to use nloptr for linear-search optimization, the call for `optim' would be similar. So assuming your data is stored as DATA:

Fit <- nloptr(x0 = c(2, 1e6), eval_f = LL, eval_grad_f = LLG, lb = c(0,0), X = DATA,
              opts = list(algorithm = "NLOPT_LD_LBFGS", maxeval = 1e5))

Your values are in Fit$solution so your Fisher information matrix estimate is the inverse of the Hessian (not negative Hessian since we are minimizing NLL, not maximizing LL) and so the estimate of the standard error can be calculated using:

sqrt(diag(solve(LLH(Fit$solution, DATA))))

and their correlation would be the off-diagonal in:

cov2cor(solve(LLH(Fit$solution, DATA)))

Solved – Why calculating standard error of an mle (and confidence intervals) from Hessian matrices

I assume to get the Expected Hessian matrix I need to run my maximum likelihood program multiple iterations to get multiple hessian matrices

No, the expectation is based on the model. We're not getting some kind of ensemble-average, we're literally finding an expectation:

$\mathcal{I}(\theta) = - \text{E} \left[\left. \frac{\partial^2}{\partial\theta^2} \log f(X;\theta)\right|\theta \right]\,.$

(though we might be finding it from a different expression that yields the same quantity).

That is, we do some algebra before we implement it in computation.

We have a single ML estimate, and we're computing the standard error from the second derivative of the likelihood at the peak -- a "sharp" peak means a small standard error, while a broad peak means a large standard error.

You might like to see that when you do this for a normal likelihood (iid observations from $N(\mu,\sigma)$, with $\sigma$ known) that this calculation yields that the Fisher information is $n/\sigma^2$, and hence that the asymptotic variance of the ML estimate of $\mu$ is $\sigma^2/n$, or its standard error is $\sigma/\sqrt{n}$. (Of course in this case, that's also the small-sample variance and standard error.)

Best Answer

Related Solutions

Solved – How to compute (or numerically estimate) the standard error of the MLE

Solved – Why calculating standard error of an mle (and confidence intervals) from Hessian matrices

Related Question