Solved – How to interpret normalized coefficients in logistic regression

logisticnormalizationregressionregression coefficients

I trained a logistic regression model with 5 features per sample. Before training, I normalized the range of my features into [0,1] (MinMax scaler). After training, I received the following coefficients for a logistic regression model:

coef_1 = [[-2.26286643 4.05722387 0.74869811 0.20538172 -0.49969841]]

In logistic regression the coefficients indicate the effect of a one-unit change in your predictor variable on the log odds of 'success'. But as my features are normalized, I wanted to know the effect of a one-unit change in the original unit. For example, the range of the first feature is [0, 94.5] that of the second is in [0.5, 180], the thrid one is in [12,95] and the last two are categorical variables. So I divided the coefficients of the model by the feature range to get the 'real' one-unit change values and got:

coef_2 = [[-0.02393 0.0225 0.009 0.205 -0.499]]

I thought that these coefficients would be the same as I would get from a logit model where I did not scale my features before training, giving me:

coef_3 = [[-0.04743728 0.04394143 -0.00247654 0.23769469 -0.55051824]]

But these are clearly not the same. Although, there is some correlation. For the first two features, coef_2 is the half of coef_3, and for the last two features, one can say that coef_2 and coef_3 are approximately the same.

My question now are:

Which of the coefficients (1, 2 or 3) gives me the true change of a one-unit change in a predictor variable and how can I rank the importance of my features?
Can I even compare real-valued features with categorical variables?

Best Answer

My question now is, which of the coeffecients (1, 2 or 3) gives me the true change of a one-unit change in a predictor variable

I've got several parts to my answer. The first is the most straightforward. You said you're using MinMaxScaler, so I'm guessing that you're using SciKit-Learn and doing this with Python. That package can make a scaler object that already has a method for undoing the scaling called inverse_transform(). There's more information over at stack exchange here or you can read the documentation here. I believe that your manual attempt at an inverse_transform simply didn't perform the right operations, and it is almost always safer to use pre-built well-tested functions from the kit anyway. It has the added benefit that if you'd like to see exactly how it works, it's open source--just read the code under the source link on the documentation site.

Part II may contain some things you already know, but here it is anyway. A logistic regression is non-linear, which means that the effect one-unit change in the predictor differs depending on the value of your predictor. The reason that we're allowed to make blanket statements in linear regression model interpretations, such as "for each 1 unit increase in $x$, $y$ tends to increase by such-and-such on average" is because a linear regression model has fixed slope coefficients. In other words, the first derivative with regard to any predictor is a constant, so the impact of a one-unit change is constant.

That is not the case in a logistic regression. As such, logistic regressions are typically used to predict the chance that a certain observation will fall into a certain category. The coefficients in a logistic regression are not regular slope coefficients that can be interpreted as simple unit-changes as in linear regression, they are in logged-odds. This makes logistic regressions much less intuitive to interpret. That much you already alluded to in the question.

However, the reason for performing the transform to begin with is unclear. If you had serious multicollinearity, sure. If the spread of the values was huge, maybe it makes sense. If there were several different units, like kilos, pounds, and inches all being used from different systems that measure different features, again, it might make sense. But I'm not convinced you really need to normalize your variables to begin with in this situation. I'd recommend trying the following instead:

Do not standardize your variables before you fit the model
Exponentiate the coefficients you get after fitting the model. This will convert them to odds instead of logged-odds. If you want, you could further convert them to probabilities to make interpretation even easier. The formula is $$probability=odds/(1+odds)$$ This will help in interpreting the impact of categorical variables.
For continuous variables, don't get caught up in trying to find a simple explanation. The actual coefficients can be interpreted as "for an increase of one unit in x, the is an increase in [coefficient] logged-odds of y. Most people won't have an intuitive grasp of what this means. If you really want to get a grasp on how much of an impact the continuous variable is making, compare a model with that variable to a model without it. These are called nested models and are regularly used in these sorts of assessments.

how can I rank the importance of my features?

I'm not sure if you need a formal test of this. If you've got the odds or probabilities, you can use your best judgement to see which one is most impactful based on the magnitude of the coefficients and how many levels they can realistically take.

Can I even compare real-valued features with categorical variables?

Sure, but I wouldn't normalize them first. You can compare different types of variables to each other, just bear in mind the meaning of the different types. If increasing the distance from the goal by 1 meter decreases the probability of making the shot by 1% and having good weather instead of bad increases the probability of making the shot by 2%, that doesn't mean that weather is more impactful--you either have good or bad, so 2% is the max increase, whereas distance could keep increasing substantially and add up. That's just an example, but you can just use common sense like this and explain your reasoning when you're comparing them.

Related Solutions

Solved – Do coefficients of logistic regression have a meaning

The coefficients from the output do have a meaning, although it isn't very intuitive to most people and certainly not to me. That is why people change them to odds ratios. However, the log of the odds ratio is the coefficient; equivalently, the exponentiated coefficients are the odds ratios.

The coefficients are most useful for plugging into formulas that give predicted probabilities of being in each level of the dependent variable.

e.g. in R

library("MASS")
data(menarche)
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
                family=binomial(logit), data=menarche)

summary(glm.out)

The parameter estimate for age is 1.64. What does this mean? Well, if you combine it with the parameter estimate for the intercept (-21.24) you can get a formula predicting the likelihood of menarche:

$P(M) = \frac{1}{1 + e^{21.24 - 1.64*age}}$

but that formula (even with just one variable!) doesn't give much of a sense of how age is related to menarche. If we use the odds ratio (which is $e^{1.64} = 5.16$ that means that, for each additional year of age, the odds of menarche are 5.16 times as big (not exactly 5.16 times as likely, but that interpretation is often used).

LASSO Regression – Interpretation of Regression Coefficients

Are the LASSO coefficients interpreted in the same method as logistic regression?

Let me rephrase: Are the LASSO coefficients interpreted in the same way as, for example, ~~OLS~~ maximum likelihood coefficients in a logistic regression?

LASSO (a penalized estimation method) aims at estimating the same quantities (model coefficients) as, say, ~~OLS~~ maximum likelihood (an unpenalized method). The model is the same, and the interpretation remains the same. The numerical values from LASSO will normally differ from those from ~~OLS~~ maximum likelihood: some will be closer to zero, others will be exactly zero. If a sensible amount of penalization has been applied, the LASSO estimates will lie closer to the true values than the ~~OLS~~ maximum likelihood estimates, which is a desirable result.

Would it be appropriate to use the features selected from LASSO in logistic regression?

There is no inherent problem with that, but you could use LASSO not only for feature selection but also for coefficient estimation. As I mention above, LASSO estimates may be more accurate than, say, ~~OLS~~ maximum likelihood estimates.

Best Answer

Related Solutions

Solved – Do coefficients of logistic regression have a meaning

LASSO Regression – Interpretation of Regression Coefficients

Related Question