Solved – GLM for proportional data

generalized linear modelproportion;r

I'm hoping someone can help me sort out how the proportion comparisons using a GLM works in R.

I'm comparing hatch success among multiple years (and later, sites). I've used a GLM to compare among years by making a txt file where the success column contains the number of chicks that hatched, and failures are the number of eggs in each nest that didn't hatch. Each row corresponds to a nest. (I did this according to descriptions in "The R Book" by Michael Crawley)

The code looks like this:

y<-cbind(success,fail)
hsmodel1<-glm(y~year,binomial)

This tests for differences in hatch success among years as #of chicks hatched/eggs laid, correct? Not # of chicks hatched /nest?

Secondly, if my species lays up to 2 eggs, is using this proportional method still valid, since there could be 0, 1 or 2 successes or failures? I'm pretty sure it is, but I am starting to doubt myself from a question a colleague asked today.

Thanks!
Mog

Best Answer

Logistic regression, like this is, assumes a binomial distribution, or, as I prefer, a Bernoulli distribution per event. I know of no case nor reason where this should not be safely assumed by itself (either it happens or it doesn't, and in a population you can always assign a probability to this). There is no reason the upper limit on your number of events per nest should influence this.

That distribution, by linearity, is assumed conditionally on the year, where the logodds are linear in year. This could be faulty, but that has nothing to do with the possible number of events, just the fact that any model can be wrong.

You can (with predict(type="response")) get the probability of an egg hatching, conditional on the year from this type of model (technically that is not exactly the same as a rate, but for most practical purposes, it is).