Could you please provide an example and explanation why to use the bivariate probit model with sample selection?
In this context, to what sample selection bias refers to?
biasbivariateeconometrics
Could you please provide an example and explanation why to use the bivariate probit model with sample selection?
In this context, to what sample selection bias refers to?
Best Answer
Unfortunately I lost the tex file for these notes, but they are only two pages, so I added screenshots:
I have a paper where we use this approach to look at what happens to bidders who lose to a sniper in their very first auction on eBay. A sniper is another participant who tries to place a bid in the final seconds of sequential ascending auctions with predetermined ending times. The outcome $y_1$ is binary: leaving the auction platform or not. The sniped dummy $y_2$ is in the outcome equation of $y_1$.
The reason you can't just put $y_2$ as a regressor is that sniping is more likely to occur in markets where there are few bidders. It is these kind of markets for which a marketplace like eBay is most attractive to buyers, implying that bidders in these markets may be more likely to return to eBay. Hence, a positive correlation between sniping and auction thinness, and a positive correlation between auction thinness and the likelihood of returning to eBay, will bias downward any effect that sniping has on bidders ceasing to bid in auctions. That is, there is selection into which auctions gets sniped.
We use a recursive bivariate probit strategy to address this concern. There are two ways that bidding occurs on eBay. First, bidders can manually insert their bid into the proxy bidding system. Second, bidders can use sniping software that does this automatically in the last seconds of the auction without their attentiveness. At nighttime, there are fewer manual bidders active on the site, and consistent with this we observe that more auctions are won by snipers. However, 10pm in New York is only 7pm in San Francisco, while 10pm in San Francisco is 1am in New York. Therefore the 10pm San Francisco bidder is much more likely to be sniped than a 10pm New York bidder. If these bidders are otherwise comparable conditional on observables, then one can use their respective time zones as an instrument for variation in the likelihood of being sniped. This is the basis of our identification strategy.
Reference:
Matt Backus, Tom Blake, Dimitriy V Masterov, and Steven Tadelis, "Is Sniping A Problem For Online Auction Markets?", Proceedings of the 24th International Conference on World Wide Web, 88-96.
Response to Question:
Here's a toy example illustrating the fundamental problem with selection, setting aside the bivariate probit stuff. The 250 circles below correspond to people with different levels of education and their potential wage offers. Suppose anyone who gets a wage offer of \$15 or less decides that he would rather go hiking instead of working, so we don't get to see his wage (orange circles). If you fit a linear model to the remaining data (navy circles), the slope will be 20% smaller than on the full sample, so the benefit of going to school will appear $0.50 lower than it really is. One way to think about this is the people with large negative epsilons are more likely to be missing in the low education groups, so it inflates the observed wages in those groups, which tilts the regression line and makes schooling look less effective.
Why does this matter? Let's focus on the fifth highest orange circle with 11 years of education, who may be right on the margin between dropping out of high school (we don't observe his costs here). His offer with 11 years of education is just under \$13. If he thinks the benefit of another year is \$2, he may leave school because he can just start hiking now and not incur the unnecessary cost. Since economists are interested in policy questions (like what would be the net benefit of college loan forgiveness), ignoring the people who went hiking (or aren't in the labor force) would be a poor choice. Using the wrong estimate could lead to some suboptimal investment in education, both individually and socially.
If your goal is to predict what the wage among workers with X years of education is, using the worker data would be OK. It is when you want to make causal statements about what would have happened to someone had they completed more school that you need to worry about selection.
Data: