Generalized Linear Model – Comparing Two Groups with Many Zeros

generalized linear modelproportion;zero inflation

I am comparing the difference in time-activity-budgets of two populations of seabirds, those in the presence of ship disturbance and those not in the presence of ship disturbance. Focal animals were chosen haphazardly and observed for up to 10 minutes. Not all animals were observed for 10 minutes but the purpose of this analysis we have chosen to include all observations that exceed 3 mins. For this reason I have chosen to model proportion of time spent rather than actual time. I do have time (secs) spent in each category and total observation time if that makes the analysis easier to accomplish. This would need to be weighted somehow though. Three activities (flight, loafing, foraging) were defined and proportions sum to 1.

I am interested in 1. determining if the time spent in flight between the two groups differs and 2. if there is a difference, do both the other activities decrease/increase or only one.

Flight is not a common activity to observe in this species and therefor there are a lot of zeros in the data. Here is the distribution of proportion time spent flying for the two populations

histogram of proportion time spent flying per observation

I think that I need to use generalized linear modeling with a zero inflated Poisson distribution but I am unclear on how to make that work correctly. Also it appears that the data is over-dispersed.

I am working in R. Any help or advice for papers to read is appreciated.

Thanks.

UPDATE: 2/6/14

I ran a zero-inflated negative binomial model and ship presence and time of day came out as significant predictors. Since my data was in proportional format I needed to transform the data to integers without loosing the weighting that proportions offered my. I multiplied all the proportions by 600 (max observation time) allowing each observation to have equal weight when using seconds flying.

The mean % time flying was best predicted with presence of ship (0,1) and time of day (morning, mid-day, evening). The predicted times matched closely with the observed times.

Interpretation: These birds appear to spend about 3 times more time flying when ships are present then absent for all three time periods. I then wanted to figure out if they were which activity(s) they were doing less of when ships were present (I measured three mutually exclusive events: flying, foraging, or loafing). Foraging time was also heavily dominated by zeros so I modeled it in the same way. This time when comparing the best fit model with predictors (just presence of ship this time) it did not improve on the fit of the null model (p = 0.10). Could this be that the true distribution is not zero-inflated negative binomial or can I interpret this as there is not a significant difference in the mean time spent foraging when ships are present and absent? This is how the raw data appears to me. The Mann-Whitney U test also comes up non-significant for foraging (although it is suggestive of a difference, p = 0.069).

Assuming that there is no true difference in foraging and a marked difference in flying am I safe to assume that birds in this specific region (I am aware of scope of inference) compensate for extra flying time by loafing less rather than foraging less when ships are present (raw data supports this claim).

Thanks for all the input so far.

Best Answer

If your data seem overdispersed you should apply Negative Binomial distribution, with its additional parameter to account for that. However, this normally applies to count data, not proportions.

The use of the Zero-Inflated Negative Binomial model in R is examplified here: Fitting a zero-inflated negative binomial regression with R

The theory is explained here: https://stats.idre.ucla.edu/r/dae/zinb/

Both dichotomic and time parts should initially consist of only the intercept term. You should then perform the Likelihood Ratio test to see if the Ship presence factor contributed to reduce the deviation significantly.

Related Question