Solved – Interpretation of logistic regression with normalized features

cross correlationinterpretationlogistictrain

With logistic regression, a one unit change in $X_1$ is associated with a $\beta_1$ change in the log odds of 'success' (alternatively, an $\exp(\beta_1)$-fold change in the odds), all else being equal. But if one applies an initial normalization to cross-correlated features (e.g. subtract by mean and divide by standard deviation), is it valid to simply apply that inverse transformation to $\beta_1$ to interpret a one unit change in the un-normalized value when considering the raw data? The normalization explained above has no effect on the cross-correlation of the features themselves, but I am curious if it would affect the outputs (and signs specifically) of the $\beta_i$ that are being trained.

Best Answer

The interpretation of logistic regression coefficients is similar in the case where you've standardized the data (subtract mean, divide by standard deviation of each feature). By standardizing, you effectively change the units to standard deviations above/below the mean. So, a one standard deviation increase in $X_1$ corresponds to a $\beta_1$ increase in the log odds. If you fit to standardized data, you can transform the coefficients back to the original units (or vice versa).

If you fit a vanilla logistic regression model to standardized vs. non-standardized data, the coefficients will take different values in each case, but both models will fit equally well (or poorly). But, this is not necessarily true if you're fitting a regularized model (e.g. $\ell_1$ or $\ell_2$ penalties on the coefficients). In this case, it's common practice to standardize first, so that all features are penalized equally.