Solved – Maximum likelihood optimization error in R

maximum likelihoodoptimizationr

Edit: A link to the data and nlm() tracing information has been added, and the code has been changed in accordance with suggestions from the comments

Download the data from here. (Dropbox link).

Download the nlm() tracing information from here. (Dropbox link)

I am working on a problem to find the maximum likelihood estimators $\left( \hat\sigma_n^2,\hat l,\hat\sigma_f^2 \right)$ for some data. The log likelihood is

$$\log \mathcal{L} \left( \sigma_n^2,l,\sigma_f^2 \right)= -\frac{1}{2} y^T \left(\mathbf{K}+\sigma_n^2\mathbf{I} \right)^{-1}y – \frac{1}{2} \log \left| \mathbf{K}+\sigma_n^2\mathbf{I} \right| – \frac{n}{2} \log2\pi$$

$y$ is a $n \times 1$ vector obtained from my data (response variables). $\mathbf{I}$ is the identity matrix. $\mathbf{K}$ is a $n\times n$ matrix depending on $\sigma_f^2$ and $l$; its matrix elements are:

$$K_{pq} = \sigma_f^2 \exp\left( – \frac{1}{2l^2} \left| x_p -x_q \right|^2 \right)$$

$x_i$ are $3\times1$ vectors (independent variables).

I tried to find the maximum likelihood estimators in R by minimizing the negative log likelihood. The functions I tried using were nlm and optim. Both gave the same error – NA/Inf were produced. How do I proceed to calculate the estimators? I would prefer more accessible answers. (I am only a beginner)

I will write my code below for reference. The matrix $\mathbf{A}$ in the code (and download) is given by:

$$A_{pq} = \exp \left( – \frac{1}{2} \left| x_p -x_q \right|^2 \right) $$

load("A.Rdata") # Contains matrix A to help calculate K (in the formula above).
                # A is the exponential part of the formula. (but without the 'l squared' part) 
load("y.Rdata") # Contains the response variable y

num_unique <- 786

Calculate_K_plus <- function(vect){
  sn2 <- vect[1]*vect[1]
  exponent <- 1/(vect[2]*vect[2])
  sf2 <- vect[3]*vect[3]
  B <- A^exponent
  B <- sf2 * B
  B <- B + sn2*diag(num_unique)
  return(B)}

minus_log_likelihood <- function(vect){
 K_plus <- Calculate_K_plus(vect)
 K_plus_inv <- solve(K_plus)
 out = 0.5 * ( t(y) %*% K_plus_inv %*% y) + 0.5 * log(det(K_plus)) + (num_unique/2)*log(2*pi)
 return(out)}

nlm(minus_log_likelihood,c(1,1,1))

Best Answer

This got too long for a comment. The problem with your function is that the determinant of K_plus was getting infinite or zero very quickly. I tweaked your function to calculate the log-determinant directly. I then used optim with different methods as well as nlm to search for the maximum likelihood estimates. The algorithms converged without problems. I also included the code to calculate the standard errors and confidence intervals based on the Hessian. All algorithms give very similar estimates.

The estimates are: $\widehat{\sigma}_{n}=5.30,\hat{l}=5.12,\widehat{\sigma}_{f}=45.01$.

The code is:

load("A.Rdata")
load("y.Rdata")

num_unique <- 786

Calculate_K_plus <- function(vect){
  sn2 <- (vect[1]*vect[1])
  exponent <- 1/(vect[2]*vect[2])
  sf2 <- vect[3]*vect[3]
  B <- A^exponent
  B <- sf2 * B
  B <- B + sn2*diag(num_unique)
  B
}

minus_log_likelihood <- function(vect){
  K_plus <- Calculate_K_plus(vect)
  K_plus_inv <- solve(K_plus)
  z <- determinant(K_plus, logarithm=TRUE)  
  K_plus_log_det <- as.numeric((z$sign*z$modulus)) # log-determinant of K_plus
  out <- 0.5 * ( t(y) %*% K_plus_inv %*% y ) + 0.5 * K_plus_log_det + (num_unique/2)*log(2*pi)
  out
}

#-----------------------------------------------------------------------------
# "Nelder-Mead" algorithm
#-----------------------------------------------------------------------------

res.optim <- optim(par=c(5.3, 5.1, 44.9), fn=minus_log_likelihood, hessian=TRUE, control=list(trace=TRUE, maxit=1000))

res.optim$par    
[1]  5.302362  5.123045 45.011507

fisher_info<- solve(res.optim$hessian)
prop_sigma<-sqrt(diag(fisher_info))
upper<-res.optim$par+1.96*prop_sigma
lower<-res.optim$par-1.96*prop_sigma
interval<-data.frame(value=res.optim$par, lower=lower, upper=upper)
interval

      value     lower     upper
1  5.302362  5.032848  5.571877
2  5.123045  3.442932  6.803157
3 45.011507 17.952756 72.070257

#-----------------------------------------------------------------------------
# "L-BFGS-B" algorithm
#-----------------------------------------------------------------------------

res.optim2 <- optim(par=c(5.3, 5.1, 44.9), fn=minus_log_likelihood, method=c("L-BFGS-B"), hessian=TRUE, control=list(trace=3, maxit=1000))

res.optim2    
[1]  5.301418  5.114984 44.901863

fisher_info<- solve(res.optim2$hessian)
prop_sigma<-sqrt(diag(fisher_info))
upper<-res.optim2$par+1.96*prop_sigma
lower<-res.optim2$par-1.96*prop_sigma
interval2<-data.frame(value=res.optim2$par, lower=lower, upper=upper)
interval2

      value     lower     upper
1  5.301418  5.031988  5.570848
2  5.114984  3.437925  6.792043
3 44.901863 17.982520 71.821206

#-----------------------------------------------------------------------------
# With "nlminb"
#-----------------------------------------------------------------------------

res.nlm <- nlminb(objective=minus_log_likelihood, start=c(5.3, 5.1, 44.9), control=list(iter.max=200, trace=1))

res.nlm$par
[1]  5.301542  5.123718 45.072189

#-----------------------------------------------------------------------------
# With "nlm"
#-----------------------------------------------------------------------------

res.nlm2 <- nlm(f=minus_log_likelihood, p=c(5.3, 5.1, 44.9), print.level=2)

res.nlm2$estimate
[1]  5.301534  5.123776 45.072711

Best Answer

Related Solutions

Solved – Hyperparameter estimation in Gaussian process

Solved – Conditional vs. Unconditional Maximum Likelihood

Related Question