A couple of different things your post brings up (hopefully you recognize that).
The first relates to deciding random vs fixed effects. In my experience deciding between fixed and random effects has two pieces:
Statistical fit. Assessed using things like a Hausman test, standard fare in most packages like Stata, SAS, R, etc. This will tell you if a random intercept "works" better with your data than a fixed effect.
Theoretical fit. How are you conceptualizing the effect? Is it truly a fixed, unchanging entity, not coming from a theoretical distribution? For example, I have rarely seen States treated like random effects - there are 50 and only 50 states (or 51 +DC, or more if you add territories), and they never change being the same states. Same thing with years when there are a few years in the panel, those are often treated as fixed, because you want to capture a common shock to all observations in that year and quantify it as a fixed effect. Other things, however, are not so clear. I'm doing an analysis of counties - of course you could treat counties as fixed effects, but there are over 3,000, and I don't think anyone would really want to be so focused on a single county. So I'm treating them as a random effect, coming from a distribution. When doing repeated measures, again, the individual is treated as random (representative hopefully of a larger population that has parameters you estimate).
The second issue you bring up is intercept vs slope. In this regard random effects and fixed effects are not comparable. A fixed effect literally just adjusts the intercept for each fixed effect - it captures the mean relationship between the given effect and the outcome variable. What that results in, is the slopes you have are within group effects, because you've already captured the variance attributable to the fixed effects. If you think think that the slope is different for each effect, the interpretation is that the effects moderate the effect of the given variable whose slope you are interested in:
$y=\alpha+\beta x + \Sigma{Z_i\theta_i}$
Where X is your covariate of interest, and Z is your vector of each fixed effect i and $\theta_i$ is the effect. Now, if you are thinking the slope is going to differ for each, you actually create an interaction term or moderator for each effect (which explodes the number of coefficients and cuts your degrees of freedom):
$y=\alpha+\beta x + \Sigma{Z_i\theta_i}+\Sigma{Z_ix\gamma_i}$
$\gamma$ is your vector of coefficients, so to get the slope for any given effect you need to add up the $\beta$ with the relevant $\gamma$. So what does $\beta$ mean here? It is not the average slope across all the effects, as it is in a random effect model. It is the slope for the omitted effect.
So, to get back to your original question:
- Start with the theoretical definition of your effects. Are they truly fixed, or do they reflect a random distribution, that has population parameters you can estimate? Answer that and then I believe you should be on your way to figuring out the next step.
- If you believe that the effects should indeed be random, then do your statistical tests. Test if random effects fit better. Then do your appropriate tests to see if a random coefficient is appropriate.
I am thinking about the same problem and found your question here. I'll tell you what I know so far. Maybe we can reach a satisfying conclusion.
The random effect models treat between subject (in your case, different Gamble.Nums) variation as normally distributed. But it may not be normal, and then if you try to fit a random effect model, the result will be biased. To fix this problem, people use dummy variables to code those subjects, and estimate their effect separately, instead of assuming a distribution. This approach looks good, but you have a bunch more parameters to estimate, and you don't have a statistical distribution of the whole population. For the random effect model, those subjects (Gamble.Nums) are assumed to be randomly sampled from a population. From the random model fit, you get estimates of the parameters of the distribution of that population.
Or, you can adopt a model selection point of view, and use anova(glmer, glm) to see which one is better.
From my own experience, the estimates from the glm using dummies and those from ranef(glmer) are strongly correlated. The real difference is not that large at all.
Best Answer
It seems to me that the three models are actually three different things. It is normal that there is some confusion about the terms "fixed" and "random" effects, because different applied fields use them differently, see for example here: https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/
Within a more econometric reading of your question, model 3 would refer to a model where values all variables are de-meaned for country (i.e., you subtract the country-level mean from each observation, across all variables). In this case you will not need to include country explicitly as a predictor variable, so the model would indeed be "without countries", as in your description. This is typically called the fixed effects estimator in econometrics, or the within estimator (because you reduce your data to variation within countries only). You can do this in R for example by using
plm::plm()
withmodel = "within"
. The plm package, which can do all three model types you mentioned, is described in detail here: https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.htmlNote that the estimates and standard errors should be exactly identical to the case where you simply add dummies for country to your model matrix via
factor(country)
(model 1).Model 1 is actually known as the LSDV (Least Square Dummy Variable) approach in econometrics. Equivalence between fixed effects and LSDV estimator is further shown and discussed in chapter 10.2 of
A note of cuation: De-meaning by hand and then running a simple
lm()
on the de-meaned data will lead to a fixed effects model with biased standard errors, becauselm()
will calculate with too many degrees of freedom, as if there were no country variables used at all. So better simply useplm()
, which corrects for this.