Solved – Why is the variable being omitted by stata

fixed-effects-modelinteractionmulticollinearityregressionstata

I am carrying out a fixed effect regression. I have a dummy variable called female. The dependent variable docvis refers to hospital visits. I created an interaction term between hhkids and female called fekids. hhkids refers to whether or not a person has kids. I wanted to see whether women's hospital visits are more affected by having children than men's.

I have interpreted from the coefficient on fekids that women's hospital visits ARE more affected than men's. That women with children are 15.77% less likely to visit the hospital than men with children are. Is this interpretation correct.

I have included the variable female in my regression. I was told by someone that I do not need to include female. Why is this? Why is female omitted? I assume that this is due to the multicollinearity between female and fekids, however when I do an OLS regression this does not happen. Why is that?

  areg  docvis hhkids age agesq married working linc addon female fekid, absorb(id)
note: female omitted because of collinearity

Linear regression, absorbing indicators                Number of obs =    6209
                                                       F(  8,  5314) =    8.25
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.4187
                                                       Adj R-squared =  0.3209
                                                       Root MSE      =  4.5747

------------------------------------------------------------------------------
      docvis |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      hhkids |   .7493504   .2887167     2.60   0.009     .1833471    1.315354
         age |  -.2326124   .1003786    -2.32   0.021    -.4293957   -.0358292
       agesq |   .0038731   .0010802     3.59   0.000     .0017553    .0059908
     married |  -.0923826   .3659839    -0.25   0.801    -.8098612     .625096
     working |  -.5702973   .2491114    -2.29   0.022    -1.058658   -.0819367
        linc |   .0886328     .23889     0.37   0.711    -.3796897    .5569553
       addon |   .3009833   .6369426     0.47   0.637    -.9476857    1.549652
      female |  (omitted)
       fekid |  -.1577091   .4279726    -0.37   0.713    -.9967111    .6812929
       _cons |   5.793355   2.426897     2.39   0.017     1.035641    10.55107
-------------+----------------------------------------------------------------
          id |      F(886, 5314) =      3.929   0.000         (887 categories)

Best Answer

If this is a fixed-effects regression model, then any variables that are constant within every unit are redundant, and will be omitted.

More specifically, the areg command creates a dummy variable for each individual (here, a dummy variable for each id). Since that dummy variable is constant for each individual, any variable that is constant for each individual--such as sex--is perfectly collinear with those dummy variables for individuals. In short, by adjusting the model for each particular individual, we are already adjusting for sex; there is no more variation to be explained by sex.

In contrast, whether an individual is married, or working, are things that can change over time, and so the effects of these variables can, in principle, be established even after controlling for whatever is unique to each individual.