Solved – binary logit regression – which test apply for detecting heteroskedasticity

binary dataheteroscedasticitylogistic

After reading a lot of different papers and a lot of different posts on the internet I still don't have a clue how to test on heteroskedasticity with my logistic regression (binary). The White test works only for OLS regression right?

Best Answer

Heteroscedasticity as such is not the main worry in logistic regression. For count data, the concern is overdispersion. The logistic model assumes that the probability of "success" is given by the model ... but what if the probability of success is, say, a Beta random variable whose mean is your model mean? The expected counts would no longer be what the logistic predicts.

You could check for this possibility by fitting a beta-binomial model to see if the fit was substantially better --- See R package bbmle.

To me, I would only go that route if I had subject matter grounds for modeling the success probability that way.

Overdispersion can also happen with Poisson counts.

Tests of heteroskedasticity

Consider the linear regression model $$ Y_i = \boldsymbol{X}_i'\boldsymbol{\beta} + \varepsilon_i $$

Based on this regression model there are several regression-based tests of heteroskedasticity -- equivalent test statistics that are not regression-based do exist, but those obviate the comparisons that we are after.

Breusch-Pagan

The Breusch-Pagan test of heteroskedasticity has the following steps:

Estimate the regression model above using OLS, and get the residuals $\widehat{\varepsilon}_i$, and the standard error of regression, $\widehat{\sigma}^2 =\tfrac{\sum_{i=1}^n\widehat{\varepsilon}_i^2}{n}$.
Then, estimate the following auxiliary regression by OLS -- a regression of the standardized residuals on the cross-products of the included regressors.

$$ \boxed{\frac{\widehat{\varepsilon}_i^2}{\widehat{\sigma}^2} = \text{vech}(\boldsymbol{X}_i\otimes\boldsymbol{X}_i')'\boldsymbol{\gamma}+\nu_i} $$

The test statistic here is $\tfrac{1}{2}ESS$, which is distributed $\chi^2_{K+\tfrac{K(K+1)}{2}}$, where there are $K$ regressors in the model.

White

The White test is based on a regression that looks very similar to the one employed by BP

$$ \boxed{\widehat{\varepsilon}_i^2 = \text{vech}(\boldsymbol{X}_i\otimes\boldsymbol{X}_i')'\boldsymbol{\gamma}+\nu_i} $$

The test statistic here is $nR^2$ which is again distributed $\chi^2_{K+\tfrac{K(K+1)}{2}}$.

Aside: Equivalence of a modified version of BP and White

You would not be mistaken in thinking that there exists a version of the BP test that is exactly equivalent to the White test (which is robust to departures of the residuals from normality). This is discussed in Waldman (1983).

Tests of cointegration

Now consider the Engle-Granger two-step residual-based tests of cointegration.

Here, the model is

$$ Y_{1t} = \beta_0 + \boldsymbol{Y}_{2t}'\boldsymbol{\beta} +\varepsilon_{1t} $$ Again, we fit the regresion model using OLS, and get the estimated residuals, $\widehat{\varepsilon}_{1t}$.

We now conduct an ADF unit root test on these residuals, that is, we fit the regression $$ \boxed{\Delta \widehat{\varepsilon}_{1t} = \beta_0 + \gamma \widehat{\varepsilon}_{1t-1} + \sum_{j=1}^p\gamma_j \Delta \widehat{\varepsilon}_{1t-j} +\nu_t} $$ and conduct a t-test of the regression coefficient $\gamma=0$ using the Engle-Yoo critical values.

Bottomline

The heteroskedasticity tests regress squares of fitted residuals on regressors, and cointegration tests regress differences of fitted residuals on lags and lags of differences of those residuals (compare the three boxed regressions).

Every model has certain features, each of which can be exploited to form tests of that model. For unit root models, the ADF tests use the specific feature in a specific model -- $\rho=1$ in an autoregressive model -- to test for unit roots. There are other tests, for example, the variance ratio tests that exploit the increasing variance aspect of unit roots. They are all, as you can imagine, related.

Solved – How to detect heteroskedasticity for logit panel regression in Stata

You could approach this problem using probit models, and once you've figured out if there's an issue and how it should be handle, then you could do equivalent logistics for ease of interpretation if you didn't want to stick with probit - they are essentially the same model in many ways, but there are some options with probit that relate to your question.

I believe you could fit your model with something like xtgee or oglm to get a first model. Then you can fit a heteroskedastic probit (oglm or a similar command). Once you have both models, since the probit model is nested within the het prob model, you can then do an LR test of nested models to see if there is an improvement in fit when using the heteroskedastic model.

I've read a surprising amount of "ignore it" regarding heteroscedasticity and binary outcomes. That seems like a bad idea, particularly with a lot of corrections available. Various robust options are available in Stata commands that address some related issues and are explained well in the Stata documentation.

I'd say I'm slightly past beginner status with this level of detail on advanced models - which translates to "use my advice as a good starting point." I might be able to come up with something better given more information about your data.

Here are some places where you could do some digging based on what you already know and what little bit of direction I've offered -

http://www3.nd.edu/~rwilliam/oglm/oglm_Stata.pdf - pretty in depth discussion and explains things using reference to a specific Stata command.

Allison, Paul. 1999. Comparing Logit and Probit Coefficients Across Groups. Sociological Methods and Research 28(2): 186-208.

Yatchew, Adonis and Zvi Griliches. Specification Error in Probit Models. 1985. The Review of Economics and Statistics 67(1):134-139.

Hope this helps.