Solved – Using proportional data with a binomial error structure in R… a worked example needing answers!

binomial distributionlogisticlogitproportion;

I am trying to test if the proportion of herbivores in spider's diets is related to the proportion of herbivores in their grassland, but am struggling to understand if I should be using a binomial model.
Initially I was going to acrsine square root transform the response variable, but having done some further reading, discovered that this transformation is these days superseeded by using a binomial error structure in my model instead. I belive this is correct…. So,

I have 30 spiders per grassland and 5 grasslands.

My current binomial model looks like this:

glm (obs.herbs.in.diet.proportion ~ prop.herb.in.grassland, family=binomial)

The response variable (obs.herbs.in.diet.proportion) is structured by two columns of data along the lines of "successes,failures", using:

obs.herbs.in.diet.proportion<- cbind(proportion.herb.diet, proportion.NOT.herb.diet)

proportion.NOT.herb.diet is obviously not measured, I have just caluclated it to be the inverse of proportion.herb.diet (which I did measure) so that my response variable will work in this R model.

An example of my data is:

grassland proportion.herb.diet  proportion.NOT.herb.diet  prop.herb.in.grassland
    1             0.23                     0.77                 0.19
    1             0.27                     0.73                 0.19
    2             0.49                     0.51                 0.58
    2             0.49                     0.51                 0.58

As I understand it, I should be using a binomial model because my response variable is bounded by 0 at its lower limit and 1 at its upper limit.

1) Does using a binomial model in this instance sound appropriate, and a better choice than a arcsine squareroot transforamtion?

2) Presumably, having proportional data for a second variable that is the explanatory variable (prop.herb.in.grassland) is not a problem, and does not require any transformation?

Additionally, when I run the model, I received the following warning:

Warning message:
In eval(expr, envir, enclos) : non-integer counts in a binomial glm!

3) Does anyone know if this means that my non-integer response variable values are inappropriate in a binomial model?

I used (summary) and get what looks to be a reasonable output and result, except I have large "under-dispersion"…. I was worried about overdispersion!

Residual deviance:  5.8082  on 147  degrees of freedom

4) Is under-dispersion a concern and should I take action against it?

Best Answer

You can certainly use a binomial model when your response variable is a proportion. However, you then need to weight each observation by the number of trials that each observation represents, if you are to get an equivalent result to the formulation where you supply the positive and negative counts. In your case, you should weight by the number of spiders that are represented by the proportions in each observation.