The binomial is for modeling Bernoulli variables (i.e., binary) or binomial variables (i.e., the number of successes from a certain number of independent trials). So this should not be applied to computed rates (successes divided by trials) directly but glm()
wants you to supply a matrix with successes and failures. Consequently, your glm()
call above yields the warning:
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
The beta regression model, on the other hand, is intended for situations where you only have a direct rate that does not correspond to success rates from a known number of independent trials. It uses a different likelihood and hence can lead to different results. Specifically, it has an additional precision parameter which is related to the variance of the observations.
Thus, if your proportions above come from a known number of independent trials, then supply this information and use a binomial GLM. Otherwise you can consider beta regression.
Additional remark: As your Y
above supplies proportions directly, the binomial likelihood does not fit. Specifically, the variance of the observations will be overestimated. If you use a quasi-binomial with an additional dispersion parameter, the model still won't be really appropriate but much closer to the beta regression results.
R> summary(betareg(Y ~ X))
Call:
betareg(formula = Y ~ X)
Standardized weighted residuals 2:
Min 1Q Median 3Q Max
-1.7480 -0.8042 -0.1105 0.8864 1.8896
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.29444 0.08715 3.378 0.000729 ***
X 0.27270 0.09068 3.007 0.002637 **
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 41.06 15.92 2.579 0.0099 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 15.15 on 3 Df
Pseudo R-squared: 0.4149
Number of iterations: 34 (BFGS) + 2 (Fisher scoring)
R> summary(glm(Y ~ X, family = quasibinomial))
Call:
glm(formula = Y ~ X, family = quasibinomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.25696 -0.11263 -0.01107 0.13491 0.25792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.29284 0.09523 3.075 0.0106 *
X 0.27078 0.09910 2.732 0.0195 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasibinomial family taken to be 0.02836306)
Null deviance: 0.52867 on 12 degrees of freedom
Residual deviance: 0.31489 on 11 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3
I suspect that, for each dose by drug combination, you have a number of spiders being tested and then you keep track of how many have died (or, alternatively, what proportion of them died).
A variable which measures the number of successes out of N trials follows a binomial distribution. In your case, given a dose by drug combination, the number of trials N is the number of spiders exposed to that dose by drug combination and "success" is whether a spider was killed by that dose by drug combination. The parameters of this distribution are:
- N, the number of trials;
- p, the probability of success on each trial.
Again, in your case N refers to the number of spiders exposed to a particular dose by drug combination and p refers to the probability of a spider dying after exposure to this combination. (It is not clear to me whether you repeat the exposure of spiders to a specific drug by dose combination, using different sets of spiders each time.)
As explained at https://www.theanalysisfactor.com/when-to-use-logistic-regression-for-percentages-and-counts/, logistic regression can handle a Binomial variable with N trials per dose by drug combination. The model will be specified the way you have it and will aim to estimate the logit of the probability p that a spider will be killed as a function of dose and drug. Recall that logit(p) = p/(1-p). The model can be specified in R exactly as you indicated, though other equivalent formulations are possible, as seen here: https://stackoverflow.com/questions/9111628/logistic-regression-cbind-command-in-glm.
When we aim to model a proportion p, using a logit model is a natural way to transform the proportion prior to modeling it. Of course, other transformations are possible, but their interpretation may not be as natural as the one provided by the logit transformation.
In your model, the logit-transformed probability that a spider will be killed is assumed to be a function of both dose and drug. If the interaction between dose and drug is not statistically significant, then the data provide no evidence that the effect of the drug is different across doses (or the effect of dose is different across drugs).
The weights argument is used to provide the number of trials (i.e., the number of spiders tested for each dose by drug combination), because your model formulation indicated the proportion of spiders killed so keeping track of the denominator used in the calculation of this proportion is important.
Please refer here for some insights on over-dispersion in binomial logistic regression models: Overdispersion in logistic regression.
Best Answer
If the predictors you used in the different regression models are measured in the same way it is no longer necessary to normalize the coefficients, since the predictors are already on the same scale the coefficients are as well.
If not, first try to rescale the predictor variables.