Solved – Linear probability model with fixed effects

binary datafixed-effects-modellinear modelpanel dataregression

If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression? Maybe I'm getting tripped up with the language and whether I should be using the lm or plm R packages.

My goal is to estimate the effect of a baby bonus. My dependent variable is a binary indicator for NEWBORN and my main independent variable of interest is an indicator for receiving the baby bonus. I control for age, age squared, education, marital status, and household income.

1) ## Linear Probability

LPM <- lm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45) # how do I add FE to a lm model in R? 

2) ## FE Model      

FE_model <- plm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45, index = "region", model = "within")

Best Answer

As indicated in the comments, the answer on Stack Overflow demonstrates, explicitly, that your coefficients are identical. I will offer some further intuition.

If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression?

Yes. The plm() function is a panel data estimator. Technically, it runs lm() on your transformed data. Typically, when students learn about "fixed effects" for the first time, they learn that it is a deviation from a "within-group" time mean. Later, they come across some empirical specification in a paper and observe a parameter in a model—estimated via least squares—that is unit-subscripted, such as $\gamma_s$ (i.e., state effect) or $\gamma_r$ (i.e., region effect), and they ask if this is equivalent to performing a fixed effects regression. It is.

The plm() function with index = "region" and model = "within" will return the same coefficients as your lm() function with as.factor(region) included as a covariate. In R, as.factor() creates a series of dummy variables for your regions. You can think of this as each region getting its own unique intercept.

In sum, treating your "region effects" as parameters to be estimated is algebraically equivalent to estimation in deviations from means. The boilerplate code below will result in identical coefficients on your treatment dummy (i.e., baby bonus).

# --- The Least Squares Dummy Variable Estimator

lm(outcome ~ treatment + ... + as.factor(region), data = ...)

# --- The Fixed Effects (Within-Group) Estimator 

plm(outcome ~ treatment + ... , index = "region", model = "within", data = ...)

I hope this helps your intuition.

Related Solutions

Solved – group fixed-effects, not individual-fixed effects using plm in R

I have worked on similar projects and am confronting one right now. The way that we handle this is to put in a fixed effect for each village and then to cluster the standard errors by village. This is not a perfect solution, but is fairly standard practice.

The plm package in R and xtreg ..., fe command in Stata, and the traditional fixed effect (within) estimator are designed to follow individuals. I believe one of the names for the method that you want is called a hierarchical linear model.

The simplest implementation in R would be something like

myLM <- lm(y ~ x + v v.t*t, data=df)

where y is the outcome of interest, x is some set of controls, v is a factor variable for the villages, v.t is a binary (factor) variable indicating whether a village was treated, and t is an indicator for pre-post treatment.

For standard inference, it is typical and recommended to produce clustered standard errors use either the multiwayvcov package or clusterSEs package.

Another method for inference, and the preferred method in Bertrand, Duflo & Mullainathan, 2004 is to perform a placebo test, where you vary "treatment" across all villages, form an empirical CDF, and see where the effect of treatment for the truly treated village sits in that distribution. Note that this is roughly the same method recommended for inference with synthetic controls of Abadie, Diamond, and Hainmueller, and has ties back to Fisher's 1935 text.

Solved – Fixed effect model with household level and state level data

This is a fixed effects model. you should probably cluster your standard errors at the state level. I think it is reasonable to assume the unemployment rate is exogenous. Roughly speaking, any single state resident cannot significantly influence the unemployment rate while the unemployment rate can have significant influence on any single resident's behavior. Education, however could be endogenous since both BMI and education could be linked to an unobserved motivation factor.

If education is endogenous, unless $\hat \beta_{edu}$ and $\hat \beta_{ur}$ are completely uncorrelated, $\hat \beta_{ur}$ will be a biased estimate of the causal effect. from here you could either

Find a REALLY good reason for why education is exogenous (I don't know if this is possible)
include other covariates to control for unobserved confounders, male/female indicators, mother's education, father's education, income, etc.
Find a good instrument for education. Though it's outdated, Angrist and Krueger (1991) use season of birth to instrument education. Labor economists have both criticized and revised on this instrument but it's a start.
Construct some sort of structural equation, such as a simultaneous system, to account for the endogeneity of both BMI and education.

Overall, unless you are trying to publish something, I would just go with (2) from above.

Best Answer

Related Solutions

Solved – group fixed-effects, not individual-fixed effects using plm in R

Solved – Fixed effect model with household level and state level data

Related Question