Solved – Bootstrapping in Binary Response Data with Few Clusters and Within-Cluster Correlation

bootstrapclustered-standard-errorslogitsimulationstata

Beware: This is (almost) a cross-post to a thread I started on the Statalist but that has not received much attention so far.

Introduction

I am learning about the problems when conducting hypothesis tests on a cluster-sample with very few clusters (<30) but considerable within-cluster correlation. So far, I read the work of Cameron/Gelbach/Miller "Bootstrap-Based Improvements for Inference with Clustered Errors (Review of Economics and Statistics 90, 414–427) [Working Paper here] as well as Cameron and Miller's "Practitioner’s Guide to Cluster-Robust Inference" (Journal of Human Resources 50, 317–370) [ Preprint here]. From what I learned if the number of clusters is small one would reject the null hypothesis too often and bootstraping procedures (possibly with asymptotic refinement) can help to overcome this problem. In their simulations a wild cluster bootstrap t procedure works best with rejection rates very close to the nominal 5%. My questions are about transferring their ideas to binary response models. To provide a basis for discussion I did some simulations in Stata to find out about the rejection rates of the null hypothesis.

Data Generating Process

I assumed the following data generating process for the "latent" variable $y^*$:

$y^*_{ig}=\beta(z_g+z_{ig})+u_{ig}$

where $z_g$ is a standard random normal variable constant for any group $g$ and $z_{ig}$ is an independent random draw from the standard normal. To induce group dependent errors as well as heteroskedasticity I assume $u_{ig}$ to be a random draw from a logistic distribution with parameter $b$ defined as follows:

$b=\sqrt{\frac{27z_g^2}{\pi^2}}$

If $y^* > 0$ the binary response variable y takes on value 1, otherwise it is 0.

Simulation

I tried to learn about the problem with simulated data. I simulated 499 datasets according to the data generating process described above, estimated a logit model and counted how often $H_0: \beta = 1$ is rejected. I assumed 15 groups, each comprising 30 observations. I estimated logit models:

  1. with no adjustment,
  2. a cluster-robust estimate for the variance-covariance matrix,
  3. a cluster bootstrap estimate for the variance-covariance matrix,
  4. a clustered bootstrap on the wald statistics.

Here is the Stata programme I wrote:

program define mysimu, rclass
    version 13.1
    clear 

    syntax [, groups(integer 15) obspgroup(integer 30) alpha(real 0.05) bootstraprep(integer 499)]

    tempname cmat vmat cmat2 vmat2 cmat3 vmat3 cmat4 vmat4 b se w rej se_clust w_clust rej_clust se_boot w_boot rej_boot 
    tempvar group newgroup helpvar helpvar2 z_g helpvar3 e_g parb u x ystar y
    //Set the number of observations
    set obs `groups'
    gen `group' = _n    
    expand `obspgroup'
    bysort `group': gen `helpvar' = _n

    //Cluster-Level Regressor Correlation
    gen `helpvar2' = rnormal() if `helpvar' == 1
    bysort `group': egen `z_g' = min(`helpvar2')

    //Cluster-Level Error Correlation
    gen `helpvar3' = runiform() if `helpvar' == 1   
    bysort `group': egen `e_g' = min(`helpvar3')

    //Data Generating Process
    generate `parb' = sqrt(27*`z_g'^2/_pi^2)
    generate `u' = -`parb'*log(1/uniform() - 1)
    generate `x' = rnormal()+`z_g'
    generate `ystar' = `x'+`u'
    generate `y' = `ystar' > 0

    //Simple Logit
    logit `y' `x'
    matrix `cmat' = e(b)
    matrix `vmat' = e(V)
    return scalar b = `cmat'[1,1]
    return scalar se = sqrt(`vmat'[1,1])
    return scalar w = (`cmat'[1,1]-1)^2/`vmat'[1,1]
    return scalar rej = (`cmat'[1,1]-1)^2/`vmat'[1,1] >  invchi2(1, 1-`alpha')

    //Clustered SE
    logit `y' `x', cluster(`group') 
    matrix `cmat2' = e(b)
    matrix `vmat2' = e(V)
    return scalar b_clust = `cmat2'[1,1]
    return scalar se_clust = sqrt(`vmat2'[1,1])
    return scalar w_clust = (`cmat2'[1,1]-1)^2/`vmat2'[1,1]
    return scalar rej_clust = (`cmat2'[1,1]-1)^2/`vmat2'[1,1] >  invchi2(1, 1-`alpha')

    //Pairs clustered bootstrap se
    logit `y' `x', vce(boot, reps(`bootstraprep') cluster(`group'))
    matrix `cmat3' = e(b)
    matrix `vmat3' = e(V)
    return scalar b_boot = `cmat3'[1,1] 
    return scalar se_boot = sqrt(`vmat3'[1,1])
    return scalar w_boot = (`cmat3'[1,1]-1)^2/`vmat3'[1,1]
    return scalar rej_boot = (`cmat3'[1,1]-1)^2/`vmat3'[1,1] >  invchi2(1, 1-`alpha')

    //Pairs clustered bootstrap wald test
    tempfile dataset
    logit `y' `x', cluster(`group')
    matrix `cmat4' = e(b)
    local theta = `cmat4'[1,1]
    matrix `vmat4' = e(V)
    local w = (`cmat4'[1,1]-1)^2/`vmat4'[1,1]
    bootstrap wstar=((_b[`x']-`theta')^2/(_se[`x'])^2), reps(`bootstraprep') cluster(`group') idcluster(`newgroup') saving(`dataset', replace): logit `y' `x', cluster(`newgroup')
    use `dataset', clear
    sum wstar, d
    return scalar rej_boot2 = `w' > r(p95)
end

The programme simulates one dataset at a time and estimates the four different versions mentioned above and saves the information on whether $H_0$ is rejected in r(rej*) at 5%. To get 499 datasets and 499 times the information on r(rej*) I run:

simulate rej=r(rej) rej_clust=r(rej_clust) rej_boot=r(rej_boot) rej_boot2=r(rej_boot2) , reps(499) seed(1342): mysimu

Results

Rejection rates are:

  1. 53%
  2. 24%
  3. 21%
  4. 19%

Questions

  1. The rejection rate for approach 4) is smaller than for approaches 2) and 3) in accordance with the results in Cameron [3]. However, the difference is small relative to their findings. Does someone have an idea why this is?
  2. As I read in Cameron/Trivedis "Microeconometrics Using Stata", the
    wild bootstrap procedure (not used here but "best" approach in their paper [2]) is for linear models only ("For linear
    regression, a wild bootstrap accomodates the more realistic
    assumptions that …"). Is there a non-linear counterpart that may help in getting closer to the 5% rejection rate in this case?
  3. In general, is there a "state-of-the-art"-approach to handle the
    problem of few clusters when modelling a binary response? If so, how
    would the (bootstrap?) procedure look like?

EDIT

In their practitioners guide Cameron and Miller point out that:

"If cluster-specific effects are present then the pairs cluster
bootstrap must be adapted to account for the following complication.
Suppose cluster 3 appears twice in a bootstrap resample. Then if
clusters in the bootstrap resample are identified from the original
cluster-identifier, the two occurrences of cluster 3 will be
incorrectly treated as one large cluster rather than two distinct
clusters. In Stata, the bootstrap option idcluster ensures that
distinct identifiers are used in each bootstrap resample. Examples are
regress y x i.id_clu, vce(boot, cluster(id_clu) idcluster(newid)
reps(400) seed(10101)) and, more simply, xtreg y x, vce(boot,
reps(400) seed(10101)) , as in this latter case Stata automatically
accounts for this complication."

Because of this I changed the code so that each cluster-draw in the bootstrap is identified as individual cluster using the idcluster option. Also, I decided to drop the initial seed value from the program as it is also defined within the simulate command. I revised the rejection rate above accordingly.

Best Answer

There is an extension of the wild bootstrap called the "score bootstrap" developed by Kline and Santos (2012) (working paper here). Whereas the wild works for OLS, the score method works additionally for ML models such as logit/probit and 2SLS and GMM models. The user-written Stata command boottest can calculate p-values using the score method after an initial estimation.