Solved – Modelling zero-inflated proportion data in R using GAMLSS

gamlsspredictionproportion;rzero inflation

I am new to the gamlss package and would like to check that I am using the correct family for proportion data (tree species cover after treatment), which is bounded between zero and one. According to the documentation, the correct distribution family for data of this type is BEINF (Beta inflated), since it allows both 0 and 1 values. The data are highly zero-inflated, with 43/82 observations having a zero response, and 3/82 values at 1. I run the following model:

m1 <- gamlss(y ~ x1 + x2, 
sigma.formula=~1, 
nu.formula=~x1 + x3 + x4, 
tau.formula=~x5, 
family=BEINF, 
data=df)

The mean response values (and SEs) given by:

pred <- predict(m1, type='response', se.fit=T)

seem reasonable.

I am also interested in the probability of obtaining a zero response (i.e. the probability of having no individuals of the target species post-treatment). However, when i try to extract the fitted values of the nu parameter (which I believe to be the probability of obtaining a zero value), using:

prednu <- predict(m1, type="response", what="nu")

I am getting predicted values of the response in the range 0.01-44.6, which I find strange. I have tried this for both the model-fitting data and new data, with the same result. However, when I use family=ZAGA (i.e. zero adjusted Gamma distribution which allows for a response with no upper bound), I get predictions between 0 and 1 for the response for the nu parameter, which seem more reasonable.

I therefore have 3 questions about my approach:

  1. Is BEINF the correct choice of distribution family for zero-inflated proportion data bounded between zero and one, and including both zeros and one values?

  2. Does predict(m1, type="response", what="nu") give the probability of obtaining a zero response?

  3. Why would predict(m1, type="response", what="nu") give values far outside the $[0,1]$ bounded range of the response variable?

We are happy to provide data if that would be helpful.

Any assistance you can provide regarding the correct use of these gamlss functionalities with my dataset would be greatly appreciated.
Kind regards,

PS – This question has been sent to GAMLSS team a week ago but hasn’t been answered yet.

Best Answer

The answer below references the inflated beta GAMLSS documentation (Rigby & Stasinopoulos, 2010, section 10.8.2, page 215). It would seem that your data could be fitted with the inflated beta model.

The response variable for the $\nu$ component of the model is a ratio of probabilities (an odds) given by

$\nu = p_0 / (1-p_0-p_1)$

where $p_0$ is the probability of zero response and $p_1$ the probability of one response. Hence $\nu$ can take values > 0.

The $\tau$ component is given by

$\tau = p_1 / (1-p_0-p_1)$

Using the predicted response values for the $\nu$ and $\tau$ components, the probability of zero response can be computed as

$p_0 = \nu / (1+\nu+\tau)$

Hope this helps.