Fixed effects models and random effects models ask different questions of the data. Specifying a set of group-level dummy variables essentially controls for all group-level unobserved heterogeneity in the average response, leaving your estimates to reflect only variability within units. Random effects models start with the assumption that there is a meta-population of (whatever effect), and that your sample reflects many draws from that population. So rather than anchoring your results around heterogeneous intercepts, your data will be used to elucidate the parameters of that (usually normal) distribution from which your data were supposedly drawn.
It is often said that fixed effects models are good for conducting inference on the data that you have, and that random effects models are good for trying to conduct inference on some larger population from which your data is a random sample.
When I learned about fixed effects models, they were motivated using error components and panel data. Take multiple observations of a given unit, and a random treatment in time $t$.
$$y_{it} = \alpha_i + \beta T_{it} + \epsilon_{it}$$
You can break your error term out into that component of your error term that varies in time, and one that doesn't:
$$y_{it} = \alpha_i + \beta T_{it} + e_i + u_{it}$$
Now subtract the groupwise mean from both sides:
$$y_{it} - \bar y_i = \alpha_i - \bar \alpha_i + \beta \left(T_{it}- \bar T_i\right) + e_i - \bar e_i+ u_{it}- \bar u_it$$
Things that aren't subscripted by $t$ come out of the equation by basic subtraction -- which is to say that the average over time is the same as it is at any time if it never changes. This includes your non-time-varying component of your error term. Thus your estimates are unconfounded by time-invariant heterogeneity.
This doesn't quite work for a random effects model -- your non-$t$-indexed variables won't be sopped up by that transformation (the "within" transformation). As such, you can draw inference on the effects of things that don't vary within group. In the real world, such things have importance. Thus, random effects are good for "modeling the data", while fixed effects models are good for getting closer to unbiased estimates of particular terms. With a random effects model, you can't make the claim to have removed that $e_i$
entirely.
In this example, time is the grouping variable. In your example, it is DID. (i.e.: it generalizes)
Zuur et al., and Faraway (from @janhove's comment above) are right; using likelihood-based methods (including AIC) to compare two models with different fixed effects that are fitted by REML will generally lead to nonsense.
Faraway (2006) Extending the linear model with R (p. 156):
The reason is that REML estimates the random effects by considering linear combinations of the data that remove the fixed effects. If these fixed effects are changed, the likelihoods of the two models will not be directly comparable
These two questions discuss the issue further: Allowed comparisons of mixed effects models (random effects primarily) ; REML vs ML stepAIC
Best Answer
Barr, Levy, Scheepers & Tily (2013) present an argument and simulations for why you should (by default) use the maximal random effects structure justified by your design.
The crux of the argument is that the maximal model will generalize better. The paper also provides an argument for why it is anti-conservative to use the maximal model (pt. 1). More generally, by allowing random slopes and intercepts you're more likely to get a better fit to the data and thus better detect when variance is attributable to the fixed effects.
(2) In model comparison with models that feature the same random effects structure, only differences in the fixed effects should affect AIC. However, if you're using the maximal model, adding a fixed effect will necessitate adding random slope and intercept terms.
It's not a great idea to use the sample data to determine if random effects are "necessary". Just because the inclusion of random effects doesn't explain variance in your current dataset doesn't mean that it is not important to the population you're making inferences about.
You're right to suspect that the model is more likely to fail to converge with more complicated random effects structures (pt. 4). You'll have to balance the benefits of using the maximal random effects structure against the real-world constraints of your data; obviously if the model doesn't converge the results should be treated with suspicion. One possible solution in this situation is to try permutation-based analysis (bootstrapping).