I am using the mgcv package in R to model the ratio of damaged culture hectares by wild boar in each french department according to some environmental covariates. I used a using a Beta distribution for the response.
For each department, the damages are estimated in 3 different culture types (« Culture »).
Also, the department are clustered into landscape types (« Cluster »).
Since I wanted to get the effect of the Culture type and the Landscape, I keep those variables as fixed effects in the model.
Also, since we have 5 repetitions in time of the response and of some covariates measurement per department and culture type, I put a random effect on the Department per Culture type and put the year as fixed effect as well.
The model takes the form :
gam_tot <-gam(resp ~ Culture + Cluster:Culture +
s(Year,k=4, by=Culture) +
s(X1, by=Culture) +
s(X2, by=Culture) +
s(Depts, bs="re", by=Culture),
family=betar(link="logit"), method="REML",
data=data, select=FALSE)
Then, I estimated the part of the model explained deviance provided by each covariate. For that, I run the model without the given covariate (keeping smooth parameters constant between models), and compute the difference in deviance between the Full model (with the given covariate) and the penalized model (without the given covariate):
(Full model Deviance – Penalized model Deviance) / Full Model Deviance
From that, I get a huge proportion of Deviance explained by the random effect (Department) of about 30 %, while the others covariates explained less than 1 %.
At this point, I have few questions :
-
Do you think my model formula is correct regarding my data and questions ?
-
Is my estimate of explained deviance correct ?
In that case, how can I explain such a discrepancy between the part of explained deviance by random and fixed effects ?
Best Answer
When you do a by-variable smooth, you are allowing for different smoothness parameters for each level of
Culture
. This is a more complex assumption than just simple nested random effects. Do you expectDepts
to have different variances for eachCulture
?I would start with
s(Depts, Culture, bs = "re")
or, if that isn't quite what you want,s(Dept, bs="re", by=Culture, id = 0)
(theid = 0
is there to give allCulture
s the same smoothness parameter).You don't have a
Cluster
main effect in the model, which is going to make theCluster:Culture
term difficult to interpret.I'm not 100% sure you need
Culture
as a main effect (or interaction); typically this is needed becauseby
variable smoother are centred due to the identifiability constraints. You already haveCulture
accounted for in the random effect spline. I don't see that you need both.I can well believe the random effect explains a huge amount of deviance — it is taking care of all the between
Depts
and betweenCulture
variance. The other terms are accounting for within cluster variation. Whilst it won't consider the fixed effect categorical terms, you could usegam.vcomp()
on your fitted model to turn all the smooths into mixed model variance components, which would show you the amount of between and within variation explained by thes()
terms.