Regression – Residual Check for Count Response Variable in Additive Mixed Models

generalized linear modelgeneralized-additive-modelheteroscedasticitymgcvregression

enter image description here

Hi,
The response variable is the count of daily number of physician visits of one specific disease.(y usually has the value for example of 4,5,6,10…,26).
A standardized residual plot is produced from:

> plot(fit$lme)

My model :

model <- gam(sum ~ s(Time,k=20) + s(RSK,k=10) + as.factor(DOW) 
         data = mydata,family = quasipoisson, method ="REMl")

The plot shows the standardized residuals versus fitted values for the fit of my additive mixed model. It seems that it is not reasonable accordance with the model assumptions? But I don't know how to avoid this pattern for the count data.

The QQplot see also below.

############## UPDATES######
############################

Follwing the advices from @Gavin Rootograms: a new way to assess count models, I checked the Rootgram.
enter image description here

It seems that the issue here is not to choose beween Negative Binomial or Poisson. While the Rootgram from both distribution looks alike. Both over predicts und under predicts among particular y-value.

The response actually only concretely take several certain values. see the plot of ECDF.

enter image description here

So my question would be should another model here be adopted like multi-categorial.

if only a small number of response values 1,…,4,5,6,…15… is observed, should we consider model for multi-categorial data (see Tutz, 2000)

Any help would be much appreciated!
enter image description here

Best Answer

If you mean the lines of points, you can't avoid these kinds of patterns; they are due to the integer nature of the response. You could use PIT residuals, but I haven't implemented them {gratia} and which aren't available in {mgcv} either.

As another diagnostic plot, consider the rootogram, which is available in the {countreg} package (on R-Forge) with a rudimentary implementation also available in the development version of {gratia} on GitHub (though you'll likely have to refit using gam() as I haven't written methods for gamm() fits yet. I would strongly advise you to refit this using gam() anyway, to avoid having to fit the model via PQL, which removes many options for model comparison and may not be as robust as using gam() and method = "REML").

Related Question