Solved – Clustered (grouped) standard errors MLE in R

clustered-standard-errorsmaximum likelihoodrsandwich

I'm doing the following maximum likelihood estimation using mle2 function from bbmle package:

library(bbmle)
llik.probit2<- function(aL,beta, Ks, Kw, Bs, Bw, dta){
  Y <- as.matrix(dta$qualified)
  sce1 <- as.matrix(dta$eds)
  wce1 <- as.matrix(dta$edw)
  sce1_obs <- (as.matrix(dta$eds_obs))
  wce1_obs <- (as.matrix(dta$edw_obs))
  obs <- as.matrix(dta$CombinedObservable1)
  c <- as.matrix(dta$const)
  phi <- pnorm(ifelse(Y == 0, -1, 1) * (-aL*c + beta*obs + (Bs+Bs*Ks-aL)*sce1 -beta*Ks*sce1_obs + (Bw+Bw*Kw-aL)*wce1 - beta*Kw*wce1_obs), log.p = TRUE)
  -sum(phi)
}

starting.pointmle2 <- list(aL=1.4, beta=0.3, Ks=0.5, Kw=0.5, Bs=0.5, Bw=0.5)

result1 <- mle2(llik.probit2, start = starting.pointmle2, data=list(dta=Mydata), skip.hessian=FALSE)

I need to estimate clustered standard errors of this model, clustering on variable c_id. I'm trying to implement the sandwich estimator but cannot retrieve 'meat' part from mle2 output. I have also tried with nlm, and optim (mle2 is basically wrapper for these methods)
Any advice how to do it? In general is there any package which provides functions to calculate clustered standard errors for MLE estimation of a general function?

Here is is a small sample of the data file (MyData) in csv format:

const,qualified,eds,edw,eds_obs,edw_obs,CombinedObservable1,c_id
1,0,0,1,0,0,-0.6838316166,1
1,1,0,1,0,0,-0.1433684328,1
1,0,0,1,0,0.0758113685,0.0758113685,1
1,0,1,0,0.2084778637,0,0.2084778637,34
1,0,0,1,0,0,-0.1622519262,34
1,0,0,1,0,0,-0.5061082675,34
1,0,0,1,0,0,-0.6952748613,34
1,0,1,0,0.9883178986,0,0.9883178986,34
1,0,0,1,0,0,-0.5311805315,34
1,0,0,1,0,0,2.7546881325,34
1,1,1,0,-0.2174263974,0,-0.2174263974,34
1,0,0,1,0,0,-0.4397037288,1
1,0,0,1,0,0,-0.3189328097,1
1,0,0,1,0,0,-0.6132276964,12
1,0,0,1,0,0,0.2459941348,12
1,0,0,1,0,0.0589196966,0.0589196966,12

Best Answer

With out sample data it's hard to test what you are looking for given your code, but I would suggest using the rms package. It is a great package and has what you are looking for.

In particular,

lrm : Fit binary and proportional odds ordinal logistic regression models using maximum likelihood estimation or penalized maximum likelihood estimation

robcov: Uses the Huber-White method to adjust the variance-covariance matrix of a fit from maximum likelihood or least squares, to correct for heteroscedasticity and for correlated responses from cluster samples

Here is an example using your data provided with a very simple model:

Sample data

df <- structure(list(const = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), qualified = c(0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), eds = c(0L, 0L, 
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), edw = c(1L, 
1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L), 
    eds_obs = c(0, 0, 0, 0.2084778637, 0, 0, 0, 0.9883178986, 
    0, 0, -0.2174263974, 0, 0, 0, 0, 0), edw_obs = c(0, 0, 0.0758113685, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0589196966), CombinedObservable1 = c(-0.6838316166, 
    -0.1433684328, 0.0758113685, 0.2084778637, -0.1622519262, 
    -0.5061082675, -0.6952748613, 0.9883178986, -0.5311805315, 
    2.7546881325, -0.2174263974, -0.4397037288, -0.3189328097, 
    -0.6132276964, 0.2459941348, 0.0589196966), c_id = c(1L, 
    1L, 1L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 1L, 1L, 12L, 
    12L, 12L)), .Names = c("const", "qualified", "eds", "edw", 
"eds_obs", "edw_obs", "CombinedObservable1", "c_id"), class = "data.frame", row.names = c(NA, 
-16L))

Regression w/o clustering:

library(rms)
d <- datadist(df)
options("datadist" = "d")
mod <- lrm(qualified ~ eds, data = df, x = TRUE, y = TRUE)

Logistic Regression Model

lrm(formula = qualified ~ eds, data = df, x = TRUE, y = TRUE)
                     Model Likelihood     Discrimination    Rank Discrim.    
                        Ratio Test            Indexes          Indexes       
Obs            16    LR chi2      1.19    R2       0.135    C       0.679    
 0             14    d.f.            1    g        0.582    Dxy     0.357    
 1              2    Pr(> chi2) 0.2760    gr       1.790    gamma   0.714    
max |deriv| 1e-09                         gp       0.083    tau-a   0.083    
                                          Brier    0.099                     

          Coef    S.E.   Wald Z Pr(>|Z|)
Intercept -2.4849 1.0408 -2.39  0.0170  
eds        1.7918 1.6073  1.11  0.2649  

Cluster around c_id:

robcov(mod, df$c_id, method = c('huber'))

Logistic Regression Model

lrm(formula = qualified ~ eds, data = df, x = TRUE, y = TRUE)
                      Model Likelihood     Discrimination    Rank Discrim.    
                         Ratio Test            Indexes          Indexes       
Obs             16    LR chi2      1.19    R2       0.135    C       0.679    
 0              14    d.f.            1    g        0.582    Dxy     0.357    
 1               2    Pr(> chi2) 0.2760    gr       1.790    gamma   0.714    
Cluster on df$c_id                         gp       0.083    tau-a   0.083    
Clusters         3                         Brier    0.099                     
max |deriv|  1e-09                                                            

          Coef    S.E.   Wald Z Pr(>|Z|)
Intercept -2.4849 0.8250 -3.01  0.0026  
eds        1.7918 0.8250  2.17  0.0299