Solved – Linear probability model with fixed effects

binary datafixed-effects-modellinear modelpanel dataregression

If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression? Maybe I'm getting tripped up with the language and whether I should be using the lm or plm R packages.

My goal is to estimate the effect of a baby bonus. My dependent variable is a binary indicator for NEWBORN and my main independent variable of interest is an indicator for receiving the baby bonus. I control for age, age squared, education, marital status, and household income.

1) ## Linear Probability

LPM <- lm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45) # how do I add FE to a lm model in R? 

2) ## FE Model      

FE_model <- plm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45, index = "region", model = "within")

Best Answer

As indicated in the comments, the answer on Stack Overflow demonstrates, explicitly, that your coefficients are identical. I will offer some further intuition.

If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression?

Yes. The plm() function is a panel data estimator. Technically, it runs lm() on your transformed data. Typically, when students learn about "fixed effects" for the first time, they learn that it is a deviation from a "within-group" time mean. Later, they come across some empirical specification in a paper and observe a parameter in a model—estimated via least squares—that is unit-subscripted, such as $\gamma_s$ (i.e., state effect) or $\gamma_r$ (i.e., region effect), and they ask if this is equivalent to performing a fixed effects regression. It is.

The plm() function with index = "region" and model = "within" will return the same coefficients as your lm() function with as.factor(region) included as a covariate. In R, as.factor() creates a series of dummy variables for your regions. You can think of this as each region getting its own unique intercept.

In sum, treating your "region effects" as parameters to be estimated is algebraically equivalent to estimation in deviations from means. The boilerplate code below will result in identical coefficients on your treatment dummy (i.e., baby bonus).

# --- The Least Squares Dummy Variable Estimator

lm(outcome ~ treatment + ... + as.factor(region), data = ...)

# --- The Fixed Effects (Within-Group) Estimator 

plm(outcome ~ treatment + ... , index = "region", model = "within", data = ...)

I hope this helps your intuition.

Related Question