Solved – Contradictory results between Breusch-Pagan test and Goldfeld-Quandt test in Python

heteroscedasticityhypothesis testingmathematical-statisticsregressionvariance

I am reading Python regression diagnostic for statsmodel in Python.

Under the heteroskedasticity tests, they introduced two test: the Breusch-Pagan test and the Goldfeld-Quandt test.

From my understanding, the null hypothesis test of both tests asserts that heteroskedasticity does not exist.
However, in the webpage, they have p-value 0.08794028782673029 and 0.3820295068692507 respectively.
This means that the Breusch-Pagan test asserts heteroskedasticity exists whereas the Goldfeld-Quandt test asserts that heteroskedasticity exists.

What is happening here? Why would they give contradictory results?

Best Answer

Because the tests look at different ways in which heteroskedasticity can manifest itself, and hence, a given data set may "look" heteroskedastic to one test and not so to another.

A bit more specifically, the Breusch-Pagan test (BP) looks at whether squared residuals can be explained by observed regressors $z_i$, while the Goldfeld-Quandt test (GQ) relies on the split-sample exercise. Hence, it is conceivable that the former test picked up heteroskedasticity from relation to a variable that did not serve as a splitting variable in the latter, which GQ could then not detect.

Here is a little example (code below - in R though, I do not know Python):

Errors are generated in a way that heteroskedasticity arises from x1, which shows in the left hand side of the plot, where the variance of the residuals increases with x1, but not with x2 (rhs). So when using GQ and splitting your sample according to x2, the test will have nothing to pick up in terms of heteroskedasticity, while it does in the lhs. So, not only can BP and GQ contradict themselves, so can different versions of GQ.

The same behavior can be produced with the BP test of course, depending on the specification of the auxiliary regression, again see the example code below.

library(lmtest)

n <- 10000
x1 <- 3 + rnorm(n)
x2 <- rnorm(n)
u <- x1*rnorm(n)
y <- u

reg <- lm(y~x1+x2)
par(mfrow=c(1,2))
plot(x1, resid(reg), cex=.5, col="green")
plot(x2, resid(reg), cex=.5, col="red")

gqtest(reg, order.by = x1) # split according to variable that reveals heteroskedasticity
gqtest(reg, order.by = x2) # split does not reveal heteroskedasticity, leading to higher p values

bptest(reg) 
bptest(reg, varformula = ~x1) # auxiliary regression that can pick up the heteroskedasticity
bptest(reg, varformula = ~x2) # this one cannot - leading to higher p-value

Output:

> gqtest(reg, order.by = x1)

    Goldfeld-Quandt test

data:  reg
GQ = 2.908, df1 = 4997, df2 = 4997, p-value < 2.2e-16
alternative hypothesis: variance increases from segment 1 to 2


> gqtest(reg, order.by = x2)

    Goldfeld-Quandt test

data:  reg
GQ = 1.0519, df1 = 4997, df2 = 4997, p-value = 0.03685
alternative hypothesis: variance increases from segment 1 to 2


> bptest(reg) 

    studentized Breusch-Pagan test

data:  reg
BP = 1214.4, df = 2, p-value < 2.2e-16


> bptest(reg, varformula = ~x1)

    studentized Breusch-Pagan test

data:  reg
BP = 1213.2, df = 1, p-value < 2.2e-16


> bptest(reg, varformula = ~x2) 

    studentized Breusch-Pagan test

data:  reg
BP = 2.0869, df = 1, p-value = 0.1486

In general, I would say it is to be expected that different widely used tests tend to sometimes give different answers. If they did not, then I would expect one test to be superseded, based on considerations such as ease of computation, reputation of the authors who published the different tests, discussion in well-known textbooks, availability of convenient software, etc.

Related Solutions

Breusch-Pagan Test – Exploring Reasons for Failure and Understanding Variance and Heteroscedasticity

Suppose that you have the model $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$. You want to test whether $\text{Var}(\epsilon_i \mid x_i) = \sigma^2$, that is, constant across $i$. To test this, you need to write $\text{Var}(\epsilon_i \mid x_i) = h(x_i)$, there $h(\cdot)$ is some function, by default for the BP test given by $\alpha_0 + \alpha_1 x_i$.

Some of your $x$ variables have bigger variances than the others, but unless there is some other variable that lets you predict which $x$'s those are, you won't pick up on that fact.

@whuber is noting that your example is not set up like this. First, you presumably expect your x to be your outcome---that's the variable that has heteroskedasticity. But it's not heteroskedasticity that you can explain. So create a new variable z that, when represented as a factor in the regression, is a set of dummy variables for membership in the three possible groups in your sample:

set.seed(1109)
x <- c(rnorm(10), rnorm(100,sd=10), rnorm(100,sd=25))
z <- c(rep(1, 10), rep(2, 100), rep(3, 100))
z <- as.factor(z)

The regression of interest here would be lm(x ~ z). The bptest() would be

bptest(x ~ z)

studentized Breusch-Pagan test

data:  x ~ z 
BP = 37.1871, df = 1, p-value = 1.073e-09

Since x has a heteroskedasticity pattern that is predictable using covariates, we can detect it. But if we didn't have that covariate, we wouldn't detect the heteroskedasticity.

To sum up in a different way, the BP test only has power against (that is, it can only detect) heteroskedasticity that is predictable using the covariates. If you can't do that, then you can't detect the heteroskedasticity.

Solved – Heteroskedasticity in linear regression model & data transformation

Actually, I'd say just the opposite. Multicolinearity is often scoffed at as a concern. The only time this is a real issue is when one variable can be written as an exact linear function of others in the model (a male dummy variable would be exactly equal to a constant/intercept term minus a female dummy variable; hence, you can't have all three in your model). A prime example is Goldberger's comparison to "micronumerousity."

Perfect multicolinearity means that your model cannot be estimated; (not perfect) multicolinearity often leads to large standard errors, but no bias or real problems; heteroskedasticity means that your standard errors are incorrect and your estimates are inefficient.

First, I would create a model that yields the parameter estimates as I want to interpret them (level change, percent change, etc.) by using logs as appropriate. Then, I would test for heteroskedasticity. The most accepted option is to simply use robust standard errors to give you correct standard errors, but for inefficient parameter estimates. Alternatively, you can use weighted least squares to get efficient estimates, but this has become less common unless you know the relationship between the variances of your observations (they each depend upon the size of the observation---like population of a country). Indeed, in cross section econometrics using a data set of any real size, robust standard errors have become required irrespective of the outcome of a BP test; they are applied almost automatically.

There isn't a good test for endogeneity. You're real problem is that the regressor is correlated with the error; OLS will force the regressor to be uncorrelated with the residual. So you won't find any correlation there. Endogeneity is what makes econometrics hard and is a whole topic unto itself.

Best Answer

Related Solutions

Breusch-Pagan Test – Exploring Reasons for Failure and Understanding Variance and Heteroscedasticity

Solved – Heteroskedasticity in linear regression model & data transformation

Related Question