Solved – Fixed effects or Random effects model

dataseteconometricsfixed-effects-modelpanel datarandom-effects-model

I am trying to understand the difference between fixed and random effects modelling. The panel data I have is in the form of basic longitudinal panel time series.

I know that I can use the Hauseman test to determinate which model to use. But my problem is that in fixed effects model I have to use fixed slopes or there won't be a coefficient for the whole model available. In random effects I can use random slopes and intercept and still get a slope coefficient for the whole model.

So how can I determinate whether to use random effects model with varying intercept and coefficient or fixed effects model using only varying coefficient and fixed slope?

Best Answer

A couple of different things your post brings up (hopefully you recognize that).

The first relates to deciding random vs fixed effects. In my experience deciding between fixed and random effects has two pieces:

Statistical fit. Assessed using things like a Hausman test, standard fare in most packages like Stata, SAS, R, etc. This will tell you if a random intercept "works" better with your data than a fixed effect.
Theoretical fit. How are you conceptualizing the effect? Is it truly a fixed, unchanging entity, not coming from a theoretical distribution? For example, I have rarely seen States treated like random effects - there are 50 and only 50 states (or 51 +DC, or more if you add territories), and they never change being the same states. Same thing with years when there are a few years in the panel, those are often treated as fixed, because you want to capture a common shock to all observations in that year and quantify it as a fixed effect. Other things, however, are not so clear. I'm doing an analysis of counties - of course you could treat counties as fixed effects, but there are over 3,000, and I don't think anyone would really want to be so focused on a single county. So I'm treating them as a random effect, coming from a distribution. When doing repeated measures, again, the individual is treated as random (representative hopefully of a larger population that has parameters you estimate).

The second issue you bring up is intercept vs slope. In this regard random effects and fixed effects are not comparable. A fixed effect literally just adjusts the intercept for each fixed effect - it captures the mean relationship between the given effect and the outcome variable. What that results in, is the slopes you have are within group effects, because you've already captured the variance attributable to the fixed effects. If you think think that the slope is different for each effect, the interpretation is that the effects moderate the effect of the given variable whose slope you are interested in:

$y=\alpha+\beta x + \Sigma{Z_i\theta_i}$

Where X is your covariate of interest, and Z is your vector of each fixed effect i and $\theta_i$ is the effect. Now, if you are thinking the slope is going to differ for each, you actually create an interaction term or moderator for each effect (which explodes the number of coefficients and cuts your degrees of freedom):

$y=\alpha+\beta x + \Sigma{Z_i\theta_i}+\Sigma{Z_ix\gamma_i}$

$\gamma$ is your vector of coefficients, so to get the slope for any given effect you need to add up the $\beta$ with the relevant $\gamma$. So what does $\beta$ mean here? It is not the average slope across all the effects, as it is in a random effect model. It is the slope for the omitted effect.

So, to get back to your original question: - Start with the theoretical definition of your effects. Are they truly fixed, or do they reflect a random distribution, that has population parameters you can estimate? Answer that and then I believe you should be on your way to figuring out the next step. - If you believe that the effects should indeed be random, then do your statistical tests. Test if random effects fit better. Then do your appropriate tests to see if a random coefficient is appropriate.

Related Solutions

Solved – Variable slopes in a fixed effects model

There seems to be some confusion in your question. The purpose of random effects (intercepts) in a growth model, is to account for the correlations within individuals. The difference in slope that you expect, between boys and girls would be modelled with an interaction between the slope (ie the fixed effect for the time variable) and sex. Additionally, you can allow each subject to have their own slope, or rather an offset from the fixed slope, by fitting random slopes.

You say:

boy and girl are fixed effects so a random effect model seems inappropriate

This is correct. Sex is a fixed effect.

In the example in your question you would fit a model such as:

height ~ sex * time + (1 | subject)

This will estimate an overall linear trend for time (the fixed effect for time) for both boys and girls (the fixed effect for sex) and also allow trend to be different for boys and girls (the sex:time interaction), while also adjusting the dependence between measurements in each person (the subject random intercept).

As mentioned in the first paragraph, You could also allow each subject to have their own slope by fitting random slopes for time:

height ~ sex * time + (time | subject)

R Random and Fixed Effects Models – How to Implement GLM for Panel Data

It seems to me that the three models are actually three different things. It is normal that there is some confusion about the terms "fixed" and "random" effects, because different applied fields use them differently, see for example here: https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/

Within a more econometric reading of your question, model 3 would refer to a model where values all variables are de-meaned for country (i.e., you subtract the country-level mean from each observation, across all variables). In this case you will not need to include country explicitly as a predictor variable, so the model would indeed be "without countries", as in your description. This is typically called the fixed effects estimator in econometrics, or the within estimator (because you reduce your data to variation within countries only). You can do this in R for example by using plm::plm() with model = "within". The plm package, which can do all three model types you mentioned, is described in detail here: https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.html

Note that the estimates and standard errors should be exactly identical to the case where you simply add dummies for country to your model matrix via factor(country) (model 1).

Model 1 is actually known as the LSDV (Least Square Dummy Variable) approach in econometrics. Equivalence between fixed effects and LSDV estimator is further shown and discussed in chapter 10.2 of

Verbeek, M. (2008). A guide to modern econometrics. John Wiley & Sons.

A note of cuation: De-meaning by hand and then running a simple lm() on the de-meaned data will lead to a fixed effects model with biased standard errors, because lm() will calculate with too many degrees of freedom, as if there were no country variables used at all. So better simply use plm(), which corrects for this.

Best Answer

Related Solutions

Solved – Variable slopes in a fixed effects model

R Random and Fixed Effects Models – How to Implement GLM for Panel Data

Related Question