Blinder-Oaxaca – Blinder-Oaxaca Decomposition and Gardeazabal and Ugidos (2004) Correction with Multiple Categorical Variables

blinder-oaxacacategorical datacategorical-encodinglogistic

I am reading the vignette for the oaxaca R package.

At paragraph 2.3 there is written that:

The results of Blinder-Oaxaca decompositions have been found to be sensitive to the researcher’s choice of the omitted baseline category when categorical variables are included as covariates (Oaxaca and Ransom 1999)

To avoid this problem and "ensure that the Blinder-Oaxaca decomposition results are invariant to the user’s choice of the omitted baseline category, oaxaca implements a procedure proposed by Gardeazabal and Ugidos (2004)".

In particular oaxaca applies a correction to the coefficients of the dummies.

At paragraph 3.1 there is written the code to achieve that:

If the regression model contains dummies that represent a categorical variable (d1, d2, d3,
etc.), these can be specified by adding another part to the formula:

y ~ x1 + x2 + x3 + … | z | d1 + d2 + d3 + …

The problem is that: I have more than one categorical variable in my model.

What is the correct way to write the code in this way?

I have converted my categorical variables into dummies using the fastDummies::dummy_cols function.

Then I drop one dummy for each categorical variable.

Suppose I have two categorical variables, d1 (3 levels) and d2 (2 levels).

I convert them into dummies, obtaining dummies d1_1, d1_2 and d1_3 for the first categorical variable, and d2_1 and d2_2 for the second categorical variable.

But If I insert it in the formula like:

y ~ x + d1_1 + d1_2 + d2_1 | gender | d1_1 + d1_2 + d2_1

Then oaxaca complains that the dummies must be mutually exclusive.

How can I account for more than one categorical variable in my model?

(Section 4 of the vignette contains a practical example on how to use one categorical variable)

Best Answer

Unfortunately, this is not supported in R's oaxaca package, as far I know. It's possible to do this with the Stata routine, though.

Just FYI: Regarding the reference category issue, the above linked article is also of interest. There is no complete solution to this issue: "Hence, there is still some arbitrariness in the method by Gardeazabal and Ugidos (2004) and Yun (2005b)." (p. 463).