Solved – How to interpret normalized coefficients in logistic regression

logisticnormalizationregressionregression coefficients

I trained a logistic regression model with 5 features per sample. Before training, I normalized the range of my features into [0,1] (MinMax scaler). After training, I received the following coefficients for a logistic regression model:

coef_1 = [[-2.26286643 4.05722387 0.74869811 0.20538172 -0.49969841]]

In logistic regression the coefficients indicate the effect of a one-unit change in your predictor variable on the log odds of 'success'. But as my features are normalized, I wanted to know the effect of a one-unit change in the original unit. For example, the range of the first feature is [0, 94.5] that of the second is in [0.5, 180], the thrid one is in [12,95] and the last two are categorical variables. So I divided the coefficients of the model by the feature range to get the 'real' one-unit change values and got:

coef_2 = [[-0.02393 0.0225 0.009 0.205 -0.499]]

I thought that these coefficients would be the same as I would get from a logit model where I did not scale my features before training, giving me:

coef_3 = [[-0.04743728 0.04394143 -0.00247654 0.23769469 -0.55051824]]

But these are clearly not the same. Although, there is some correlation. For the first two features, coef_2 is the half of coef_3, and for the last two features, one can say that coef_2 and coef_3 are approximately the same.

My question now are:

  • Which of the coefficients (1, 2 or 3) gives me the true change of a one-unit change in a predictor variable and how can I rank the importance of my features?
  • Can I even compare real-valued features with categorical variables?

Best Answer

My question now is, which of the coeffecients (1, 2 or 3) gives me the true change of a one-unit change in a predictor variable

I've got several parts to my answer. The first is the most straightforward. You said you're using MinMaxScaler, so I'm guessing that you're using SciKit-Learn and doing this with Python. That package can make a scaler object that already has a method for undoing the scaling called inverse_transform(). There's more information over at stack exchange here or you can read the documentation here. I believe that your manual attempt at an inverse_transform simply didn't perform the right operations, and it is almost always safer to use pre-built well-tested functions from the kit anyway. It has the added benefit that if you'd like to see exactly how it works, it's open source--just read the code under the source link on the documentation site.

Part II may contain some things you already know, but here it is anyway. A logistic regression is non-linear, which means that the effect one-unit change in the predictor differs depending on the value of your predictor. The reason that we're allowed to make blanket statements in linear regression model interpretations, such as "for each 1 unit increase in $x$, $y$ tends to increase by such-and-such on average" is because a linear regression model has fixed slope coefficients. In other words, the first derivative with regard to any predictor is a constant, so the impact of a one-unit change is constant.

That is not the case in a logistic regression. As such, logistic regressions are typically used to predict the chance that a certain observation will fall into a certain category. The coefficients in a logistic regression are not regular slope coefficients that can be interpreted as simple unit-changes as in linear regression, they are in logged-odds. This makes logistic regressions much less intuitive to interpret. That much you already alluded to in the question.

However, the reason for performing the transform to begin with is unclear. If you had serious multicollinearity, sure. If the spread of the values was huge, maybe it makes sense. If there were several different units, like kilos, pounds, and inches all being used from different systems that measure different features, again, it might make sense. But I'm not convinced you really need to normalize your variables to begin with in this situation. I'd recommend trying the following instead:

  1. Do not standardize your variables before you fit the model
  2. Exponentiate the coefficients you get after fitting the model. This will convert them to odds instead of logged-odds. If you want, you could further convert them to probabilities to make interpretation even easier. The formula is $$probability=odds/(1+odds)$$ This will help in interpreting the impact of categorical variables.
  3. For continuous variables, don't get caught up in trying to find a simple explanation. The actual coefficients can be interpreted as "for an increase of one unit in x, the is an increase in [coefficient] logged-odds of y. Most people won't have an intuitive grasp of what this means. If you really want to get a grasp on how much of an impact the continuous variable is making, compare a model with that variable to a model without it. These are called nested models and are regularly used in these sorts of assessments.

how can I rank the importance of my features?

I'm not sure if you need a formal test of this. If you've got the odds or probabilities, you can use your best judgement to see which one is most impactful based on the magnitude of the coefficients and how many levels they can realistically take.

Can I even compare real-valued features with categorical variables?

Sure, but I wouldn't normalize them first. You can compare different types of variables to each other, just bear in mind the meaning of the different types. If increasing the distance from the goal by 1 meter decreases the probability of making the shot by 1% and having good weather instead of bad increases the probability of making the shot by 2%, that doesn't mean that weather is more impactful--you either have good or bad, so 2% is the max increase, whereas distance could keep increasing substantially and add up. That's just an example, but you can just use common sense like this and explain your reasoning when you're comparing them.

Related Question