Here is a solution.
First, I replaced MS1
with M2
since there is no varibale with the name MS1
in your example data.
attach(data)
p1<-M2/(M2+M1)
y<-cbind(M2,M1)
The model:
GM1<-glm(y~BPT,family=binomial (logit),data=data)
Note that there is one important difference between your model and Crawley's. His predictor varibale was log-transformed, your predictor variable BPT
is used untransformed. By the way: Since BPT
does include zeros, log-transformation would result in -Inf
values.
Now the main plot:
plot(BPT,p1,col="black",pch=1,main="Relationship a",xlab="Browsing pressure",
ylab="Moose Damage Survey")
Now the important differences. Crawley created a sequence xv
containg values corresponding to his log-transformed predictor variable density
. Note that the lengths of density
and xv
must be identical.
Compare:
range(log(density))
#[1] 0.000000 6.095825
xv<-seq(0,6,0.1)
range(xv)
#[1] 0 6
Since your predictor variable is untransformed, you don't need this kind of sequence but can use the original variable BPT
.
lines(BPT,predict(GM1,type="response"))
This is the plot:
If you would like to use your own sequence of values for the prediction, use this:
xv<-seq(0,0.7,length.out = length(BPT))
lines(xv,predict(GM1, list(BPT = xv), type="response"))
Your question does not specify how the p was calculated, but I assume it is the percentage of beetles on every plot you have sampled. I can't see the image you've attached either (I'm probably out of luck for today), but judging from the code you've posted, I'd guess you need to sort the predicted values before plotting them with lines(). Something like this should work:
# This is the data, saved in object d
d<-structure(list(altitude = c(304L, 842L, 877L, 1505L, 1118L, 702L,
806L, 960L, 975L, 1112L, 1348L, 502L, 1428L, 1521L, 1040L, 1412L,
1231L, 980L, 1053L), beetles = c(88L, 7L, 21L, 2L, 11L, 25L,
16L, 16L, 34L, 11L, 11L, 58L, 2L, 0L, 9L, 8L, 5L, 14L, 2L), rest = c(35L,
18L, 27L, 25L, 19L, 24L, 17L, 32L, 61L, 17L, 32L, 24L, 13L, 35L,
6L, 19L, 19L, 16L, 10L)), .Names = c("altitude", "beetles", "rest"
), class = "data.frame", row.names = c(NA, -19L))
# This fits the model
model=glm(cbind(d$beetles, d$rest)~altitude,family=binomial(logit),data=d)
# The plot of proportion of beetles against the altitude
plot(prop.table(cbind(d$beetles, d$rest), 1)[,1]~d$altitude, ylab="Beetles (proportional)", xlab="Altitude")
# The prediction of the proportion of the beetles from the fit
pred<-predict(model,type="response")
# Ordering the predicted values (and altitude) according to the altitude
pred2<-pred[order(d$altitude)]
alt<-d$altitude[order(d$altitude)]
# Drawing the line
lines(alt,pred2)
Or, a simpler method, sort the data frame first:
d<-d[order(d$altitude),]
model=glm(cbind(d$beetles, d$rest)~altitude,family=binomial(logit),data=d)
plot(prop.table(cbind(d$beetles, d$rest), 1)[,1]~d$altitude, ylab="Beetles (proportional)", xlab="Altitude")
lines(d$altitude,predict(model,type="response"))
Best Answer
Logistic regression, like this is, assumes a binomial distribution, or, as I prefer, a Bernoulli distribution per event. I know of no case nor reason where this should not be safely assumed by itself (either it happens or it doesn't, and in a population you can always assign a probability to this). There is no reason the upper limit on your number of events per nest should influence this.
That distribution, by linearity, is assumed conditionally on the year, where the logodds are linear in year. This could be faulty, but that has nothing to do with the possible number of events, just the fact that any model can be wrong.
You can (with
predict(type="response")
) get the probability of an egg hatching, conditional on the year from this type of model (technically that is not exactly the same as a rate, but for most practical purposes, it is).