Solved – Endogeneity problem in OLS

2slsendogeneityheteroscedasticityinstrumental-variablesleast squares

I have an OLS-model with about 30 predictor variables, from which I got both big endogeneity and heteroscedasticity problems. The variables reporting these problems also pop up in the check for multicollinearity. How can I resolve this problem in R? Should I remove or combine the troublesome variables?

I'm pretty new to R so I tried to create an instrumental variable, covering the supposedly underlying effect of these variables but it just gave me a subset instead of a combined effect in a new variable of these separate variables.

Eventually I want to estimate a two-stage OLS, but that's getting pretty difficult when I won't be able to resolve these issues first.

It's for my senior thesis, so any other suggestions on model improvement are more than welcome!!

Best Answer

I would suggest to separate your problem into several subproblems.

1) The massive amount of explanatory variables. Depending on your sample size, $30$ predictors might be to much. Especially if you also have multicollinearity issues. Hence it might be usefull to start with some theoretical reasoning which variables to include and which not to include.

I suppose that some of the variables are transformations of each other like $x$ and $x^2$? If you have similar variables that are measuring essentially the same you should exclude those you are regarding as less relevant. In general it also can be recommended so start with a small simple model with the most important explanatory variables and then step-by-step enlarge your model.

2) Endogeneity: Again I would suggest to start with thinking about the relation you want to model. Endogeneity in general can have three reasons (as shown in standard textbooks like Wooldridges "Econometric analysis of cross section and panel data"). Classical is the omitted variable problem. Second is the measurement error problem where we control for all relevant covariates but some of the error in the measurement of them goes into the error term and third is the problem of Simulateneity. So you have to specify the source of your endogeneity before you try to technically solve it. In general I do not view endogeneity as a problem of linear regression with a lot of covariates per se.

3) Instrumental Variables. Indeed a two-stage-least-squares approach with a valid instrument might solve your endogeneity problem. But in practice it is often very hard to find a valid instrumental variable $Z$. Because it has to satisfy:

i) $Z$ must be independent of the error term of the true data generating process $e$

ii) $Z$ and the potentially endogenous variable(s) $X_{en}$ must be related conditional on all exogenous variable(s) $X_{ex}$.

As a small classical example: If you do a wage regression and assume that the true mechanism is: $$\log(wage)=\beta_0+\beta_1 education + \beta_2 ability+ e $$ you assume that the wage if a function of education and ability. However, you cannot measure ability, that is most probably correlated with education. Hence you only observe a process $$\log(wage)=\beta_0+\beta_1 education + u $$ with $$u=\beta_2 ability+ e$$ Hence $u$ and $education$ are correlated and you are in a endogeneity trap. A suitable instrument must be independent of $e$ and correlated with $education$. This are hard conditions and the independence assumptions must even go untested. A solution would be to find a proxy variable for $ability$ (as $IQ$ for example), this is often more convenient but not the same as an instrument. To make a long story short: This is mostly a theoretical and not a statistical question. You have to argue on theoretical grounds why your instrument is valid (or not).

4) Heteroscedatisty: This is not a serious problem as far as I can see as it is accepted practice in econometrics to use heteroscedasticity robust standard errors (white standard errors) that are implemented easily by hand using matrix algebra or the $\texttt{R}$ package $\texttt{ivpack}$ (https://cran.r-project.org/web/packages/ivpack/ivpack.pdf)

Related Question