Logistic Regression Coefficients – Understanding Their Significance

interpretationlogisticregression

I am currently reading a paper concerning voting location and voting preference in the 2000 and 2004 election. In it, there is a chart which displays logistic regression coefficients. From courses years back and a little reading up, I understand logistic regression to be a way of describing the relationship between multiple independent variables and a binary response variable. What I'm confused about is, given the table below, because the South has a logistic regression coefficient of .903, does that mean that 90.3% of Southerners vote republican? Because of the logistical nature of the metric, that this direct correlation does not exist. Instead, I assume that you can only say that the south, with .903, votes Republican more than the Mountains/plains, with the regression of .506. Given the latter to be the case, how do I know what is significant and what is not and is it possible to extrapolate a percentage of republican votes given this logistic regression coefficient.
Table showing logistic regression coefficients

As a side note, please edit my post if anything is stated incorrectly

Best Answer

That the author has forced someone as thoughtful as you to have ask a question like this is compelling illustration of why the practice -- still way too common -- of confining reporting of regression model results to a table like this is so unacceptable.

  1. You can, as pointed out, try to transform the logit coefficient into some meaningful indication of the effect being estimated for the predictor in question but that's cumbersome and doesn't convey information about the precision of the prediction, which is usually pretty important in a logistic regression model (on voting in particular).

  2. Also, the use of multiple asterisks to report "levels" of significance reinforces the misconception that p-values are some meaningful index of effect size ("wow--that one has 3 asterisks!!"); for crying out loud, w/ N's of 10,000 to 20,000, completely trivial differences will be "significant" at p < .001 blah blah.

  3. There is absolutely no need to mystify in this way. The logistic regression model is an equation that can be used (through determinate calculation or better still simulation) to predict probability of an outcome conditional on specified values for predictors, subject to measurement error. So the researcher should report what the impact of predictors of interest are on the probability of the outcome variable of interest, & associated CI, as measured in units the practical importance of which can readily be grasped. To assure ready grasping, the results should be graphically displayed. Here, for example, the researcher could report that being a rural as opposed to an urban voter increases the likelihood of voting Republican, all else equal, by X pct points (I'm guessing around 17 in 2000; "divide by 4" is a reasonable heuristic) +/- x% at 0.95 level of confidence-- if that's something that is useful to know.

  4. The reporting of pseudo R^2 is also a sign that the modeler is engaged in statistical ritual rather than any attempt to illuminate. There are scores of ways to compute "pseudo R^2"; one might complain that the one used here is not specified, but why bother? All are next to meaningless. The only reason anyone uses pseudo R^2 is that they or the reviewer who is torturing them learned (likely 25 or more yrs ago) that OLS linear regression is the holy grail of statistics & thinks the only thing one is ever trying to figure out is "variance explained." There are plenty of defensible ways to assess the adequacy of overall model fit for logistic analysis, and likelihood ratio conveys meaningful information for comparing models that reflect alternative hypotheses. King, G. How Not to Lie with Statistics. Am. J. Pol. Sci. 30, 666-687 (1986).

  5. If you read a paper in which reporting is more or less confined to a table like this don't be confused, don't be intimidated, & definitely don't be impressed; instead be angry & tell the researcher he or she is doing a lousy job (particularly if he or she is polluting your local intellectual environment w/ mysticism & awe--amazing how many completely mediocre thinkers trick smart people into thinking they know something just b/c they can produce a table that the latter can't understand). For smart, & temperate, expositions of these ideas, see King, G., Tomz, M. & Wittenberg., J. Making the Most of Statistical Analyses: Improving Interpretation and Presentation. Am. J. Pol. Sci. 44, 347-361 (2000); and Gelman, A., Pasarica, C. & Dodhia, R. Let's Practice What We Preach: Turning Tables into Graphs. Am. Stat. 56, 121-130 (2002).

Related Question