Solved – GLM for proportional data

generalized linear modelproportion;r

I'm hoping someone can help me sort out how the proportion comparisons using a GLM works in R.

I'm comparing hatch success among multiple years (and later, sites). I've used a GLM to compare among years by making a txt file where the success column contains the number of chicks that hatched, and failures are the number of eggs in each nest that didn't hatch. Each row corresponds to a nest. (I did this according to descriptions in "The R Book" by Michael Crawley)

The code looks like this:

y<-cbind(success,fail)
hsmodel1<-glm(y~year,binomial)

This tests for differences in hatch success among years as #of chicks hatched/eggs laid, correct? Not # of chicks hatched /nest?

Secondly, if my species lays up to 2 eggs, is using this proportional method still valid, since there could be 0, 1 or 2 successes or failures? I'm pretty sure it is, but I am starting to doubt myself from a question a colleague asked today.

Thanks!
Mog

Best Answer

Logistic regression, like this is, assumes a binomial distribution, or, as I prefer, a Bernoulli distribution per event. I know of no case nor reason where this should not be safely assumed by itself (either it happens or it doesn't, and in a population you can always assign a probability to this). There is no reason the upper limit on your number of events per nest should influence this.

That distribution, by linearity, is assumed conditionally on the year, where the logodds are linear in year. This could be faulty, but that has nothing to do with the possible number of events, just the fact that any model can be wrong.

You can (with predict(type="response")) get the probability of an egg hatching, conditional on the year from this type of model (technically that is not exactly the same as a rate, but for most practical purposes, it is).

Related Solutions

Solved – Problems plotting GLM data of binomial proportional data

Here is a solution.

First, I replaced MS1 with M2 since there is no varibale with the name MS1 in your example data.

attach(data)
p1<-M2/(M2+M1)
y<-cbind(M2,M1)

The model:

GM1<-glm(y~BPT,family=binomial (logit),data=data)

Note that there is one important difference between your model and Crawley's. His predictor varibale was log-transformed, your predictor variable BPT is used untransformed. By the way: Since BPT does include zeros, log-transformation would result in -Inf values.

Now the main plot:

plot(BPT,p1,col="black",pch=1,main="Relationship a",xlab="Browsing pressure",
     ylab="Moose Damage Survey")

Now the important differences. Crawley created a sequence xv containg values corresponding to his log-transformed predictor variable density. Note that the lengths of density and xv must be identical. Compare:

range(log(density))
#[1] 0.000000 6.095825
xv<-seq(0,6,0.1)
range(xv)
#[1] 0 6

Since your predictor variable is untransformed, you don't need this kind of sequence but can use the original variable BPT.

lines(BPT,predict(GM1,type="response"))

This is the plot:

enter image description here

If you would like to use your own sequence of values for the prediction, use this:

xv<-seq(0,0.7,length.out = length(BPT))
lines(xv,predict(GM1, list(BPT = xv), type="response"))

Solved – Problem plotting GLM data of binomial proportional data 2

Your question does not specify how the p was calculated, but I assume it is the percentage of beetles on every plot you have sampled. I can't see the image you've attached either (I'm probably out of luck for today), but judging from the code you've posted, I'd guess you need to sort the predicted values before plotting them with lines(). Something like this should work:

# This is the data, saved in object d
d<-structure(list(altitude = c(304L, 842L, 877L, 1505L, 1118L, 702L, 
806L, 960L, 975L, 1112L, 1348L, 502L, 1428L, 1521L, 1040L, 1412L, 
1231L, 980L, 1053L), beetles = c(88L, 7L, 21L, 2L, 11L, 25L, 
16L, 16L, 34L, 11L, 11L, 58L, 2L, 0L, 9L, 8L, 5L, 14L, 2L), rest = c(35L, 
18L, 27L, 25L, 19L, 24L, 17L, 32L, 61L, 17L, 32L, 24L, 13L, 35L, 
6L, 19L, 19L, 16L, 10L)), .Names = c("altitude", "beetles", "rest"
), class = "data.frame", row.names = c(NA, -19L))

# This fits the model
model=glm(cbind(d$beetles, d$rest)~altitude,family=binomial(logit),data=d)   

# The plot of proportion of beetles against the altitude
plot(prop.table(cbind(d$beetles, d$rest), 1)[,1]~d$altitude, ylab="Beetles (proportional)", xlab="Altitude")

# The prediction of the proportion of the beetles from the fit
pred<-predict(model,type="response")

# Ordering the predicted values (and altitude) according to the altitude
pred2<-pred[order(d$altitude)]
    alt<-d$altitude[order(d$altitude)]

# Drawing the line
lines(alt,pred2)

Or, a simpler method, sort the data frame first:

d<-d[order(d$altitude),]
model=glm(cbind(d$beetles, d$rest)~altitude,family=binomial(logit),data=d)   
plot(prop.table(cbind(d$beetles, d$rest), 1)[,1]~d$altitude, ylab="Beetles (proportional)", xlab="Altitude")
lines(d$altitude,predict(model,type="response"))

Best Answer

Related Solutions

Solved – Problems plotting GLM data of binomial proportional data

Solved – Problem plotting GLM data of binomial proportional data 2

Related Question