Suppose that you have the model $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$. You want to test whether $\text{Var}(\epsilon_i \mid x_i) = \sigma^2$, that is, constant across $i$. To test this, you need to write $\text{Var}(\epsilon_i \mid x_i) = h(x_i)$, there $h(\cdot)$ is some function, by default for the BP test given by $\alpha_0 + \alpha_1 x_i$.
Some of your $x$ variables have bigger variances than the others, but unless there is some other variable that lets you predict which $x$'s those are, you won't pick up on that fact.
@whuber is noting that your example is not set up like this. First, you presumably expect your x
to be your outcome---that's the variable that has heteroskedasticity. But it's not heteroskedasticity that you can explain. So create a new variable z
that, when represented as a factor in the regression, is a set of dummy variables for membership in the three possible groups in your sample:
set.seed(1109)
x <- c(rnorm(10), rnorm(100,sd=10), rnorm(100,sd=25))
z <- c(rep(1, 10), rep(2, 100), rep(3, 100))
z <- as.factor(z)
The regression of interest here would be lm(x ~ z)
. The bptest()
would be
bptest(x ~ z)
studentized Breusch-Pagan test
data: x ~ z
BP = 37.1871, df = 1, p-value = 1.073e-09
Since x
has a heteroskedasticity pattern that is predictable using covariates, we can detect it. But if we didn't have that covariate, we wouldn't detect the heteroskedasticity.
To sum up in a different way, the BP test only has power against (that is, it can only detect) heteroskedasticity that is predictable using the covariates. If you can't do that, then you can't detect the heteroskedasticity.
Actually, I'd say just the opposite. Multicolinearity is often scoffed at as a concern. The only time this is a real issue is when one variable can be written as an exact linear function of others in the model (a male dummy variable would be exactly equal to a constant/intercept term minus a female dummy variable; hence, you can't have all three in your model). A prime example is Goldberger's comparison to "micronumerousity."
Perfect multicolinearity means that your model cannot be estimated; (not perfect) multicolinearity often leads to large standard errors, but no bias or real problems; heteroskedasticity means that your standard errors are incorrect and your estimates are inefficient.
First, I would create a model that yields the parameter estimates as I want to interpret them (level change, percent change, etc.) by using logs as appropriate. Then, I would test for heteroskedasticity. The most accepted option is to simply use robust standard errors to give you correct standard errors, but for inefficient parameter estimates. Alternatively, you can use weighted least squares to get efficient estimates, but this has become less common unless you know the relationship between the variances of your observations (they each depend upon the size of the observation---like population of a country). Indeed, in cross section econometrics using a data set of any real size, robust standard errors have become required irrespective of the outcome of a BP test; they are applied almost automatically.
There isn't a good test for endogeneity. You're real problem is that the regressor is correlated with the error; OLS will force the regressor to be uncorrelated with the residual. So you won't find any correlation there. Endogeneity is what makes econometrics hard and is a whole topic unto itself.
Best Answer
Because the tests look at different ways in which heteroskedasticity can manifest itself, and hence, a given data set may "look" heteroskedastic to one test and not so to another.
A bit more specifically, the Breusch-Pagan test (BP) looks at whether squared residuals can be explained by observed regressors $z_i$, while the Goldfeld-Quandt test (GQ) relies on the split-sample exercise. Hence, it is conceivable that the former test picked up heteroskedasticity from relation to a variable that did not serve as a splitting variable in the latter, which GQ could then not detect.
Here is a little example (code below - in R though, I do not know Python):
Errors are generated in a way that heteroskedasticity arises from
x1
, which shows in the left hand side of the plot, where the variance of the residuals increases withx1
, but not withx2
(rhs). So when using GQ and splitting your sample according tox2
, the test will have nothing to pick up in terms of heteroskedasticity, while it does in the lhs. So, not only can BP and GQ contradict themselves, so can different versions of GQ.The same behavior can be produced with the BP test of course, depending on the specification of the auxiliary regression, again see the example code below.
Output:
In general, I would say it is to be expected that different widely used tests tend to sometimes give different answers. If they did not, then I would expect one test to be superseded, based on considerations such as ease of computation, reputation of the authors who published the different tests, discussion in well-known textbooks, availability of convenient software, etc.