Solved – GLM: binomial(logit) with weights=tested

binomial distributiongeneralized linear modelmathematical-statisticsoverdispersionr

I am trying to decipher what this GLM means for a test:

glm(proportion.correct ~ dose*drug, family=binomial(logit), weights=tested)

The experiment looks at the proportion of spiders killed by different doses of different drugs.

I will have to talk through a similar example with a member of staff. Here is my logic so far:

GLM: a general linear model tests how a variable is affected by other variables. Comprised of a linear predictor and a link and variance function.

Binomial: used because it is proportion data. Not sure what is is though.

logit: again, used because it is proportion data. Really not sure what it means.

dose*drug: this tests the interaction. A further note tells me there was no interaction. This means that the value of the dose does not matter, the effect of the drug will be the same?

weights=tested: not too sure again, is this because there were different sample sizes for different drugs?

Finally, it states "to correct for over-dispersion, F tests were used" and "what other way could you perform this analysis?" I am not sure about why F tests were used and what other way this could be analysed.

Best Answer

I suspect that, for each dose by drug combination, you have a number of spiders being tested and then you keep track of how many have died (or, alternatively, what proportion of them died).

A variable which measures the number of successes out of N trials follows a binomial distribution. In your case, given a dose by drug combination, the number of trials N is the number of spiders exposed to that dose by drug combination and "success" is whether a spider was killed by that dose by drug combination. The parameters of this distribution are:

  1. N, the number of trials;
  2. p, the probability of success on each trial.

Again, in your case N refers to the number of spiders exposed to a particular dose by drug combination and p refers to the probability of a spider dying after exposure to this combination. (It is not clear to me whether you repeat the exposure of spiders to a specific drug by dose combination, using different sets of spiders each time.)

As explained at https://www.theanalysisfactor.com/when-to-use-logistic-regression-for-percentages-and-counts/, logistic regression can handle a Binomial variable with N trials per dose by drug combination. The model will be specified the way you have it and will aim to estimate the logit of the probability p that a spider will be killed as a function of dose and drug. Recall that logit(p) = p/(1-p). The model can be specified in R exactly as you indicated, though other equivalent formulations are possible, as seen here: https://stackoverflow.com/questions/9111628/logistic-regression-cbind-command-in-glm.

When we aim to model a proportion p, using a logit model is a natural way to transform the proportion prior to modeling it. Of course, other transformations are possible, but their interpretation may not be as natural as the one provided by the logit transformation.

In your model, the logit-transformed probability that a spider will be killed is assumed to be a function of both dose and drug. If the interaction between dose and drug is not statistically significant, then the data provide no evidence that the effect of the drug is different across doses (or the effect of dose is different across drugs).

The weights argument is used to provide the number of trials (i.e., the number of spiders tested for each dose by drug combination), because your model formulation indicated the proportion of spiders killed so keeping track of the denominator used in the calculation of this proportion is important.

Please refer here for some insights on over-dispersion in binomial logistic regression models: Overdispersion in logistic regression.

Related Question