Solved – Estimation with MLE and returning the score/gradient (QMLE)

maximum likelihoodoptimizationr

I am estimating a simple AR(1) process by the ML approach. I also wish to compute the Quasi MLE standard errors, which is given by the sandwich form of the Hessian and the Score (see for example the last slide here)

So, I start by just specifying the (conditional) log likelihood for the (gaussian) AR(1) process. Then I optimise this with R's optim, which returns the Hessian to me, evaluated at the MLE-estimates, which I use as my information matrix estimate, to get the standard errors of my parameters.

So far so good (I get the same results as with the stats toolbox in Matlab).

But, how do I proceed to estimate the QMLE standard errors? For that I need the estimate of the outer product of the score function (i.e. the outer product of the gradient evaluated at the MLE estimates).

I have not found any way to get an estimate (numerically) for the gradient in any of R's optimization /ML commands. Am I missing something?. Thank you

data = read.table("Data/AR.txt", header=FALSE)
y = as.vector(data$V1) # A simple vector of observations: n1, n2, ... , nT

#Conditional LH
loglik = function(theta, y) {
  T = length(y)
  L = sum (dnorm(y[2:T], 
       mean = theta[1] + theta[2]*y[1:T-1], 
       sd = theta[3], log = TRUE))
  return (-L)
}

start=c(2.5, 0.6, 3)
b = optim(start, loglik, y=y, hessian=TRUE)
I = solve(b$hessian)
se = sqrt(diag(I)) # All good. The same MLE estimates and SE's as in Matlab.

EDIT:
I could perhaps try and use the numDeriv package to get the gradient of the likelihood function (evaluated at every observation). But I am stuck on exactly how to accomplish my goal, as I don't know how to rewrite my likelihood function for that purpose…

EDIT2: NA

EDIT3: Sorry for my stupidity, the sum of outer products is of course not the same as the outer product of the sums. It seems consistent now:

sum = numeric(3)
for (t in 2:length(y)) {
  g = grad(LLi, x=b$par, y=y, t=t)
  sum = sum + outer(g,g)
}

I2 = solve(sum)
se2 = sqrt(diag(I2))

Where LLi is the Likelihood function for each observation:

LLi = function(theta, y, t) {
  L = dnorm(y[t], 
                 mean = theta[1] + theta[2]*y[t-1], 
                 sd = theta[3], log = TRUE)
  return (L)
}

Which gives me standard errors:

> se2
[1] 0.41208510 0.04256279 0.10242072

Which is reasonably identical(?) to those obtained by the Hessian:

> se
[1] 0.40621637 0.04179929 0.09874189

Any suggestions for improvements? Programming wise my approach doesnt seem that elegant. Thanks again.

Best Answer

The numDeriv package can indeed be used to compute the gradient and the hessian (if needed). In both cases the argument y of the log-likelihood is passed through the dots mechanism, using an argument with the suitable name. For a vector-valued function, the jacobian function of the same package can be used similarly.

You could also consider computing analytical derivatives rather than numerical ones.

library(numDeriv)
H <- hessian(func = loglik, x = b$par, y = y)
g <- grad(func = loglik, x = b$par, y = y)

We can compute as well the Jacobian of a function returning a vector of length $T-1$.

mlogDens <- function(theta, y) {
  T <- length(y)
  -dnorm(y[2:T], mean = theta[1] + theta[2] * y[1:(T-1)], 
                 sd = theta[3], log = TRUE)
}
## a matrix with dim (T-1, 3)
G <- jacobian(func = mlogDens, x = b$par, y = y)
## a matrix with dim (9, T-1)
GG <- apply(G, MARGIN = 1, FUN = tcrossprod)
## a vector with length 9 representing a symmetric mat.
GGsum0 <- apply(GG, MARGIN = 1, FUN = sum)
## a symmetric mat.
GGsum <- matrix(GGsum0, nrow = 3, ncol = 3)