Solved – Random effect in GAM (mgcv package)

I am using the mgcv package in R to model the ratio of damaged culture hectares by wild boar in each french department according to some environmental covariates. I used a using a Beta distribution for the response.

For each department, the damages are estimated in 3 different culture types (« Culture »).
Also, the department are clustered into landscape types (« Cluster »).
Since I wanted to get the effect of the Culture type and the Landscape, I keep those variables as fixed effects in the model.

Also, since we have 5 repetitions in time of the response and of some covariates measurement per department and culture type, I put a random effect on the Department per Culture type and put the year as fixed effect as well.

The model takes the form :

gam_tot <-gam(resp  ~ Culture + Cluster:Culture +
                        s(Year,k=4, by=Culture) +
                        s(X1, by=Culture) +
                        s(X2, by=Culture)  +
                        s(Depts, bs="re", by=Culture),
                        family=betar(link="logit"), method="REML", 
                        data=data, select=FALSE)

Then, I estimated the part of the model explained deviance provided by each covariate. For that, I run the model without the given covariate (keeping smooth parameters constant between models), and compute the difference in deviance between the Full model (with the given covariate) and the penalized model (without the given covariate):
(Full model Deviance – Penalized model Deviance) / Full Model Deviance

From that, I get a huge proportion of Deviance explained by the random effect (Department) of about 30 %, while the others covariates explained less than 1 %.

At this point, I have few questions :

Do you think my model formula is correct regarding my data and questions ?
Is my estimate of explained deviance correct ?
In that case, how can I explain such a discrepancy between the part of explained deviance by random and fixed effects ?

Best Answer

When you do a by-variable smooth, you are allowing for different smoothness parameters for each level of Culture. This is a more complex assumption than just simple nested random effects. Do you expect Depts to have different variances for each Culture?

I would start with s(Depts, Culture, bs = "re") or, if that isn't quite what you want, s(Dept, bs="re", by=Culture, id = 0) (the id = 0 is there to give all Cultures the same smoothness parameter).

You don't have a Cluster main effect in the model, which is going to make the Cluster:Culture term difficult to interpret.

I'm not 100% sure you need Culture as a main effect (or interaction); typically this is needed because by variable smoother are centred due to the identifiability constraints. You already have Culture accounted for in the random effect spline. I don't see that you need both.

I can well believe the random effect explains a huge amount of deviance — it is taking care of all the between Depts and between Culture variance. The other terms are accounting for within cluster variation. Whilst it won't consider the fixed effect categorical terms, you could use gam.vcomp() on your fitted model to turn all the smooths into mixed model variance components, which would show you the amount of between and within variation explained by the s() terms.

Best Answer

Related Solutions

Solved – Panel data: fixed individual and random time effect

Related Question