Having an unbalanced panel is not a problem nowadays. In the past, when econometrics had to be done by hand, inverting matrices for unbalanced panels was more difficult but for computers this is not a problem. The only worry connected today with this is the question why the panel is unbalanced: is it due to attrition? If yes, is this attrition random or related to characteristics of the statistical units? For instance, in surveys people with higher education tend to be more responsive and stay in the panel longer for that reason.
Regarding the fixed effects model, have you checked whether the variables that are time-invariant in theory are actual not varying over time? Sometimes coding errors sneak in and then all the sudden a variable varies over time when it shouldn't. One way of checking this is to use the xtsum
command which displays overall, between, and within summary statistics. The time-invariant variables should have a zero within standard deviation. If they don't then something went wrong in the coding.
Having a negative Hausman test statistics is a bad thing because the matrices that the test is built on are positive semi-definite and therefore the theoretical values of the test are positive. Negative values point towards model misspecification or a too small sample (related to this is this question).
If you cluster your standard errors you also need a modified version of the Hausman test. This is implemented in the xtoverid
command. You can use it like this:
xtreg ln_r_prisperkg_Frst_102202 Dflere_mottak_tur i.landingsfylkekode i.kvartiler_ny markedsk_torsk gjenv_TAC_NØtorsk_år_prct lalder_fartøy i.fangstr r_minst_Frst_torsk gjenv_kvote_NØtorsk_fartøy_prct i.lengde_gruppering mobilitet, fe vce(cluster fartyid)
xtoverid
Rejecting the null rejects the validity of the assumptions underlying the random effects mode.
The xtset
command only takes into account the unit id for fixed effects estimation. The time variable does not eliminate time fixed effects. So if you do
xtset id time
xtreg y x, fe
will give you the exact same results as
xtset id
xtreg y x, fe
The time variable is only specified for commands for which the sorting order of the data matters, for instance xtserial
which tests for panel autocorrelation requires this. This has been discussed here. So if you want to include time fixed effects, you need to include the day dummies separately via i.day
, for example. In this context, the season and year dummies make sense so it's good that you use them.
Both approaches, using group fixed effects and/or cluster-adjusted standard error take into account different issues related to clustered (or panel) data and I would clearly view them as distinct approaches. Often you want to use both of them:
First of all, cluster-adjusted standard error account for within-cluster correlation or heteroscedasticity which the fixed-effects estimator does not take into account unless you are willing to make further assumptions, see the Imbens and Wooldridge lecture slides for an good discussion of short and long panels and various issues related to this problem. There is also a novel paper about this topic by Cameron and Miller: A Practitioner's Guide to Cluster-Robust Inference which might be interesting for you. If you do not want to model the variance-covariance matrix and you suspect that within-cluster correlation is present, I advise to use cluster robust standard error because the bias in your SE may be severe (much more problematic than for heteroscedasticity, see Angrist & Pischke Chapter III.8 for a discussion of this topic. But you need enough cluster (Angrist and Pischke say 40-50 as a role of thumb). Cluster-adjusted standard error take into account standard error but leave your point estimates unchanged (standard error will usually go up)!
Fixed-effects estimation takes into account unobserved time-invariant heterogeneity (as you mentioned). This can be good or bad: On the hand, you need less assumptions to get consistent estimations. On the other hand, you throw away a lot of variance which might be useful. Some people like Andrew Gelman prefer hierarchical modeling to fixed effects but here opinions differ. Fixed-effects estimation will change both, point and interval estimates (also here standard error will usually be higher).
So to sum up: Cluster-robust standard error are an easy way to account for possible issues related to clustered data if you do not want to bother with modeling inter- and intra-cluster correlation (and there are enough clusters available). Fixed-effects estimation will take use only certain variation, so it depends on your model whether you want to make estimates based on less variation or not. But without further assumptions fixed-effects estimation will not take care of the problems related to intra-cluster correlation for the variance matrix. Neither will cluster-robust standard error take into account problems related to the use of fixed-effects estimation.
Best Answer
If you are interested in this formulation for causal inference about $\beta$ then the unknown quantities represented by $c_i$ need only be stable for the duration of the study / data for fixed effects to identify the relevant causal quantity.
If you are concerned that the quantities represented by $c_i$ aren't stable even over this period then fixed effects won't do what you want. Then you can use random effects instead, although if you expect correlation between random $c_i$ and $X_i$ you'd want to condition $c_i$ on $\bar{X}_i$ in a multilevel setup. Concern about this correlation is often one of the motivations for a fixed effects formulation because under many (but not all) circumstances you don't need to worry about it then.
In short, your concern about variation in the quantities represented by $c_i$ is very reasonable, but mostly as it affects the data for the period you have rather than periods you might have had or that you may eventually have but don't.