I have a panel dataset of students with their test scores and certain characteristics like student gender and parents' education. Let's call the main regressor of interest "x". If I control for student fixed effects, their fixed characteristics should become superfluous, yes? But I notice that the coefficient on x is considerably different if I control for the fixed characteristics along with student fixed effects, as opposed to only including student fixed effects. Shouldn't the coefficient on x be the same in both cases? What could be going on? Will be grateful for any help.
Regression Analysis – Controlling for Covariates in Fixed Effects Regression
fixed-effects-modelpanel dataregressionregression coefficients
Related Solutions
There are a couple of things that need to be addressed before getting into the $\gamma_{s(j)}$ issue, just to clear up some features of FEs. You mention that this is conceptual, so I don't want to assume you thought through every detail, necessarily, but there is something more fundamental that should be considered before looking at $\gamma_{s(j)}$. A really important part of this model is $\gamma_i$ (student FE). Including this means all time-invariant factors are already "controlled for" that are related to the students. Whatever time-invariant differences (average level of grade over the panel) between students that do exist are eliminated (students are now on the same expected overall "level"). Variables like race can't be in a model with $\gamma_i$. Race might be expect to generate different overall "levels" of grade, but it doesn't change within-person over the panel. The time-invariant "level" that might be captured by race is already captured by $\gamma_i$, and race would drop from the model due to collinearity. Just like race, county is also "controlled for" by $\gamma_i$, unless individuals are moving between counties (another can of worms). This is important, because you say:
I want the variation that drives the estimation of $\beta$ to be chiefly cross-sectional variation in county income...
However, the cross-sectional effect of income "level" on grades among counties is already absorbed by $\gamma_i$. If we set aside class-subject for the moment, the interpretation of $\beta$ would be the expected change in a student's grade associated with a one-unit increase in county-level income (as a deviation from the mean county-level income for each student over the panel). $\gamma_i$ changes all interpretations to within-student effects. If between-student is absorbed and students stay in their counties, between-county is also absorbed in the student FE.
Each student has many subjects during the panel, though, so that is NOT already absorbed by $\gamma_i$. Since we know that subjects have systematic variation in average "levels" or grading scales among all students, we know that a change from one subject to another will generate an expected change in their grade that is not related to $income_{c(i),t}$. Like we did with students (and thus, race, county etc), we can make all subjects "equal" in their time-invariant attributes by absorbing the effect of each subject on the "level" of grades in $\gamma_{s(j)}$. Without this, we don't know if the change came from income or from moving between class-subjects. Thus, $\gamma_{s(j)}$ is important to include and doesn't interfere with the cross-sectional effect of income, because that was already absorbed by $\gamma_i$.
After including $\gamma_i$ and $\gamma_{s(j)}$ (and accounting changes that affect all students in a semester with $\gamma_t$), the interpretation of $\beta$ is the expected increase in grades caused by a one-unit increase in $income_{c(i),t}$ after accounting for time-invariant differences in students (including their counties) and class-subjects (and overall trends). This model won't tell you about cross-sectional effects of county income.
If you are interested in a mix of the within- and between(cross-sectional)-effect of county income, you can use a random effect for student, but you won't be able to say how much effect comes from variation within- versus between-students (and thus between counties). This is a fundamental issue in research with observational data. Another option is the "hybrid model" that isolates within and between effects. This R package gives a nice explanation of that setup.
Best Answer
Correct.
Adjusting for any additional observed qualities/characteristics of your students in presence of the student fixed effects is not necessary, not to mention meaningless. In fact, the observed time-constant factors are completely redundant with the student fixed effects. That being said, your model shouldn't be returning estimates for any these fixed covariates.
They should.
Any of the presumably fixed attributes of students are completely collinear with the student fixed effects. Software would invariably drop these redundant regressors; in other words, most software packages will exclude the time-invariant variables without any additional work on your part. As for why the results are "considerably" different, it's hard to say without seeing your data. Here is what I suggest:
Inspect the raw data to see if the observed student level characteristics exhibit any time variation. I'm not sure how you're "controlling" for time-constant student characteristics in the presence of student fixed effects. You're not actually controlling for anything. In essence, you're adjusting for something that can't be estimated. If, for example, gender, sex, personality, and/or parental education are truly stable features, then they're redundant regressors; you gain nothing by adjusting for them in the presence of the student fixed effects.
Check to see that you're actually estimating the student fixed effects properly. Again, the more popular software packages such as R, Python, and Stata will exclude gender and/or parental education for you—assuming they are, in fact, time-constant fixed factors—so you'd be comparing two models that are identical in terms of the number of parameters estimated. Including a full set of dummy variables for all students will suffice, but many canned routines now exist to help you estimate the student effects. Ensure you're estimating the fixed intercepts appropriately.
I suspect the observed student characteristics do exhibit some variation over time, hence the discrepancy you're observing. But even if the variables are somewhat sluggish in terms of their time variation, I wouldn't expect the results to be considerably different. You may have to quantify what "considerable" means in the context of your study.
And don't assume you can reliably measure all "stable" student level attributes. The student fixed effects will adjust for all fixed characteristics specific to your students—even those you haven't thought of!
I hope this answer is helpful. Try diagnosing the problem yourself. If you're still stuck, post a small subset of your data and code and we can help you further.