Solved – standardized coefficients from glm logit

generalized linear modellogisticrregression coefficients

I am trying to create a coefficient plot from multiple logistic regression models, which all have the same predictors, but different sample sizes. This is a pre-test to a multilevel model. My question is two fold:

Given, I want to compare the effect sizes of the same predictor in the different models, I assume I need to use standardized coefficients. How does one calculate standardized coefficients in a logit model?
Is there an easy way to estimate such coefficients in R? For instance with OLS, I could rely on the "lm.beta" function from the QuantPsyc package. I am wondering, is there a functional equivalent for a glm logit? I could not find an immediate solution myself.

Best Answer

If the predictors you used in the different regression models are measured in the same way it is no longer necessary to normalize the coefficients, since the predictors are already on the same scale the coefficients are as well.

If not, first try to rescale the predictor variables.

Related Solutions

Solved – Significance in beta regression and glm binomial

The binomial is for modeling Bernoulli variables (i.e., binary) or binomial variables (i.e., the number of successes from a certain number of independent trials). So this should not be applied to computed rates (successes divided by trials) directly but glm() wants you to supply a matrix with successes and failures. Consequently, your glm() call above yields the warning:

Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

The beta regression model, on the other hand, is intended for situations where you only have a direct rate that does not correspond to success rates from a known number of independent trials. It uses a different likelihood and hence can lead to different results. Specifically, it has an additional precision parameter which is related to the variance of the observations.

Thus, if your proportions above come from a known number of independent trials, then supply this information and use a binomial GLM. Otherwise you can consider beta regression.

Additional remark: As your Y above supplies proportions directly, the binomial likelihood does not fit. Specifically, the variance of the observations will be overestimated. If you use a quasi-binomial with an additional dispersion parameter, the model still won't be really appropriate but much closer to the beta regression results.

R> summary(betareg(Y ~ X))

Call:
betareg(formula = Y ~ X)

Standardized weighted residuals 2:
    Min      1Q  Median      3Q     Max 
-1.7480 -0.8042 -0.1105  0.8864  1.8896 

Coefficients (mean model with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.29444    0.08715   3.378 0.000729 ***
X            0.27270    0.09068   3.007 0.002637 ** 

Phi coefficients (precision model with identity link):
      Estimate Std. Error z value Pr(>|z|)   
(phi)    41.06      15.92   2.579   0.0099 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Type of estimator: ML (maximum likelihood)
Log-likelihood: 15.15 on 3 Df
Pseudo R-squared: 0.4149
Number of iterations: 34 (BFGS) + 2 (Fisher scoring) 

R> summary(glm(Y ~ X, family = quasibinomial))

Call:
glm(formula = Y ~ X, family = quasibinomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.25696  -0.11263  -0.01107   0.13491   0.25792  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.29284    0.09523   3.075   0.0106 *
X            0.27078    0.09910   2.732   0.0195 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasibinomial family taken to be 0.02836306)

    Null deviance: 0.52867  on 12  degrees of freedom
Residual deviance: 0.31489  on 11  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 3

Solved – GLM: binomial(logit) with weights=tested

I suspect that, for each dose by drug combination, you have a number of spiders being tested and then you keep track of how many have died (or, alternatively, what proportion of them died).

A variable which measures the number of successes out of N trials follows a binomial distribution. In your case, given a dose by drug combination, the number of trials N is the number of spiders exposed to that dose by drug combination and "success" is whether a spider was killed by that dose by drug combination. The parameters of this distribution are:

N, the number of trials;
p, the probability of success on each trial.

Again, in your case N refers to the number of spiders exposed to a particular dose by drug combination and p refers to the probability of a spider dying after exposure to this combination. (It is not clear to me whether you repeat the exposure of spiders to a specific drug by dose combination, using different sets of spiders each time.)

As explained at https://www.theanalysisfactor.com/when-to-use-logistic-regression-for-percentages-and-counts/, logistic regression can handle a Binomial variable with N trials per dose by drug combination. The model will be specified the way you have it and will aim to estimate the logit of the probability p that a spider will be killed as a function of dose and drug. Recall that logit(p) = p/(1-p). The model can be specified in R exactly as you indicated, though other equivalent formulations are possible, as seen here: https://stackoverflow.com/questions/9111628/logistic-regression-cbind-command-in-glm.

When we aim to model a proportion p, using a logit model is a natural way to transform the proportion prior to modeling it. Of course, other transformations are possible, but their interpretation may not be as natural as the one provided by the logit transformation.

In your model, the logit-transformed probability that a spider will be killed is assumed to be a function of both dose and drug. If the interaction between dose and drug is not statistically significant, then the data provide no evidence that the effect of the drug is different across doses (or the effect of dose is different across drugs).

The weights argument is used to provide the number of trials (i.e., the number of spiders tested for each dose by drug combination), because your model formulation indicated the proportion of spiders killed so keeping track of the denominator used in the calculation of this proportion is important.

Please refer here for some insights on over-dispersion in binomial logistic regression models: Overdispersion in logistic regression.

Best Answer

Related Solutions

Solved – Significance in beta regression and glm binomial

Solved – GLM: binomial(logit) with weights=tested

Related Question