Solved – DHARMa diagnostics show significant deviations in KS tests for a glmm with beta distribution

beta distributionglmmglmmtmbkolmogorov-smirnov testr

I'm trying to use glmmTMB to fit a beta-distributed generalized mixed effects model with nested random effects. DHARMa residual diagnostics show a KS test with significant deviation. Is this serious cause for concern? Does this signify I am using the wrong distribution and my models are invalid? Does anyone have suggestions for model improvement?

I am attempting to show the relationship between salmon abundance and its fertilization effect on forests by means of satellite NDVI (greeness) measurement. An example of two of my models are:

glmm1 <-  glmmTMB(grow_mean ~ combined_abundance_scaled + dist + slope + summer_mean_temp_scaled + summer_mean_precip_scaled + dist_head_scaled + sat + (1|block/plot.id),
                                  family=list (family="beta",link="logit"),
                                  data = fulldf)

glmm2 <-  glmmTMB(grow_mean ~ pulse_year + dist + slope + summer_mean_temp_scaled +
 summer_mean_precip_scaled + dist_head_scaled + sat + (1|block/plot.id),
                                family=list (family="beta",link="logit"),
                                data = fulldf)

Where:

grow_mean = satellite NDVI measurements, ranging from 0.5085 – 0.8948

combined_abundance_scaled = scaled yearly abundance of salmon

pulse_year = a yes/no categorical variable that denotes weather the year had 'extreme' abundance

Other variables are: distance from riverbank (CLOSE/FAR), slope (Erosional/depositional/unknown), scaled mean temperature, scaled mean precipitation, scaled distance from headwaters, satellite (LANDSAT 5/LANDSAT 7/LANDSAT 8), and the random effects are plot.id (169 plots where yearly satellite data was collected) and block (2 associated plots, close and far, in each block).

data example

When running the models I receive the warning messages:

Warning messages:

1: In glmmTMB(grow_mean ~ combined_abundance_scaled + dist + slope +  :
  some components missing from ‘family’: downstream methods may fail

2: In mkTMBStruc(formula, ziformula, dispformula, combForm, mf, fr,  :
  specifying ‘family’ as a plain list is deprecated

However, these same warning messages popped up in the glmmTMB vignette and didn't seem to be problematic, so I continued on.

simulationOutput <- simulateResiduals(fittedModel = glmm1, plot = T)

enter image description here

simulationOutput <- simulateResiduals(fittedModel = glmm3, plot = T)

glmm3 resid

How much of a problem is this? From my understanding strong deviatons in the Kolmogorov-Smirnov test show a poor goodness of fit. Is the problem likely with my beta-distribution, or do I need to transform my data in some way? I know there a couple outliers in the data (below) but there are a lot of data points (5915 NDVI observations), and quite mean centered.

Apologies if this question is over or under-explained, I'm fairly new to R coding and statistics. Thank you so much for any advice you could offer.

More DHARMa diagnostics for glmm1 and a cullen and frey for the NDVI data:

enter image description here

enter image description here

enter image description here

Best Answer

The warning should go away if you use family=beta_family().

As for the significant KS test, perhaps the large amount of observations you have makes it very sensitive to the slightest deviation from uniformity?

(Also, specifying quantreg=T in the simulateResiduals() will (eventually) give you a more readable residual vs predicted plot.)