If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression? Maybe I'm getting tripped up with the language and whether I should be using the lm
or plm
R packages.
My goal is to estimate the effect of a baby bonus. My dependent variable is a binary indicator for NEWBORN and my main independent variable of interest is an indicator for receiving the baby bonus. I control for age, age squared, education, marital status, and household income.
1) ## Linear Probability
LPM <- lm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45) # how do I add FE to a lm model in R?
2) ## FE Model
FE_model <- plm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45, index = "region", model = "within")
Best Answer
As indicated in the comments, the answer on Stack Overflow demonstrates, explicitly, that your coefficients are identical. I will offer some further intuition.
Yes. The
plm()
function is a panel data estimator. Technically, it runslm()
on your transformed data. Typically, when students learn about "fixed effects" for the first time, they learn that it is a deviation from a "within-group" time mean. Later, they come across some empirical specification in a paper and observe a parameter in a model—estimated via least squares—that is unit-subscripted, such as $\gamma_s$ (i.e., state effect) or $\gamma_r$ (i.e., region effect), and they ask if this is equivalent to performing a fixed effects regression. It is.The
plm()
function withindex = "region"
andmodel = "within"
will return the same coefficients as yourlm()
function withas.factor(region)
included as a covariate. In R,as.factor()
creates a series of dummy variables for your regions. You can think of this as each region getting its own unique intercept.In sum, treating your "region effects" as parameters to be estimated is algebraically equivalent to estimation in deviations from means. The boilerplate code below will result in identical coefficients on your treatment dummy (i.e., baby bonus).
I hope this helps your intuition.