Solved – Heckman sample selection

econometricsheckmanprobitregression

On page 9 in http://jenni.uchicago.edu/Oxford2005/four_param_all_2005-08-07_csh.pdf
ATE – the average treatment effect is the expected gain from participation in a program for a random individual. For example, evaluate the impact of going to college on wages.

Due to selection bias estimation steps are:

  1. Probit, to model probability of an individual going to college. Two different inverse Mills ratios are calculated.

  2. for those who did go to college, do OLS of wage on explanatory variables (like gender, etc.) and do the same for those who did not go to college. For each OLS regression include an appropriate inverse Mills ratio obtained from step 1 as an additional explanatory variables.

  3. ATE on is the average of the difference in predicted values using parameter estimates for college and non-college groups.

My questions are:

  1. on step 3, there is no need to use parameter estimates on inverse Mills ratios used in prediction? I just drop these coefficients calculating the ATE.

  2. do I need to keep the variables in the OLS same across college and non-college groups? If I fit OLS for college and non-college groups different variables are going to be significant in explaining variation in income. So, when I calculate the ATE some parameter estimates will be zero.

  3. I had decided to split independent variables into two sets, one for probit and another for OLS. In the OLS, if I use the inverse Mills ratios together with variables used in the Probit, there is high multicollinearity. Even if unbiased estimates are obtained in the presence of multicollinearity, I am worried about the prediction and wide confidence interval due to inflated standard errors.

Best Answer

  1. The answer is yes, you do not need to use the parameters of inverse Mills ratios. But you must include them in the regression nevertheless, or your other parameters will be biased.

  2. According to the article yes. Although if different variables are statistically significant in different regression there is no problem. Just assume that coefficients for the non-significant regressors are zero.

  3. Splitting is perfectly reasonable. Since you are fitting two models, one for decision whether to go to college or not and another for log-earnings, it is perfectly reasonable to assume that different variables will be important. I should investigate this further though, high multicolinearity when using the same variables in probit and ols regression is not a standard feature of Heckman model as far as I know.

Related Question