Regression Analysis – Controlling for Covariates in Fixed Effects Regression

fixed-effects-modelpanel dataregressionregression coefficients

I have a panel dataset of students with their test scores and certain characteristics like student gender and parents' education. Let's call the main regressor of interest "x". If I control for student fixed effects, their fixed characteristics should become superfluous, yes? But I notice that the coefficient on x is considerably different if I control for the fixed characteristics along with student fixed effects, as opposed to only including student fixed effects. Shouldn't the coefficient on x be the same in both cases? What could be going on? Will be grateful for any help.

Best Answer

If I control for student fixed effects, their fixed characteristics should become superfluous, yes?

Correct.

Adjusting for any additional observed qualities/characteristics of your students in presence of the student fixed effects is not necessary, not to mention meaningless. In fact, the observed time-constant factors are completely redundant with the student fixed effects. That being said, your model shouldn't be returning estimates for any these fixed covariates.

But I notice that the coefficient on $x$ is considerably different if I control for the fixed characteristics along with student fixed effects, as opposed to only including student fixed effects. Shouldn't the coefficient on $x$ be the same in both cases?

They should.

Any of the presumably fixed attributes of students are completely collinear with the student fixed effects. Software would invariably drop these redundant regressors; in other words, most software packages will exclude the time-invariant variables without any additional work on your part. As for why the results are "considerably" different, it's hard to say without seeing your data. Here is what I suggest:

  1. Inspect the raw data to see if the observed student level characteristics exhibit any time variation. I'm not sure how you're "controlling" for time-constant student characteristics in the presence of student fixed effects. You're not actually controlling for anything. In essence, you're adjusting for something that can't be estimated. If, for example, gender, sex, personality, and/or parental education are truly stable features, then they're redundant regressors; you gain nothing by adjusting for them in the presence of the student fixed effects.

  2. Check to see that you're actually estimating the student fixed effects properly. Again, the more popular software packages such as R, Python, and Stata will exclude gender and/or parental education for you—assuming they are, in fact, time-constant fixed factors—so you'd be comparing two models that are identical in terms of the number of parameters estimated. Including a full set of dummy variables for all students will suffice, but many canned routines now exist to help you estimate the student effects. Ensure you're estimating the fixed intercepts appropriately.

I suspect the observed student characteristics do exhibit some variation over time, hence the discrepancy you're observing. But even if the variables are somewhat sluggish in terms of their time variation, I wouldn't expect the results to be considerably different. You may have to quantify what "considerable" means in the context of your study.

And don't assume you can reliably measure all "stable" student level attributes. The student fixed effects will adjust for all fixed characteristics specific to your students—even those you haven't thought of!

I hope this answer is helpful. Try diagnosing the problem yourself. If you're still stuck, post a small subset of your data and code and we can help you further.

Related Question