Solved – Normalizing a Continuous Variable for Appropriate Use Alongside Binary Variables

binary datacontinuous datanormalizationr

I am fitting a model where I estimate my Dependent Variable based on about 20 Binary Variables (0/1), and one continuous variable. I've read about the importance of normalizing that continuous variable, but have a question about the details.

I frequently see mean=0 and st dev=1 recommended as the distribution to normalize continuous variables to. And that makes sense to me if I'm normalizing a bunch of continuous variables to compare to each other. However…

Question 1: Given that this continuous variable will be a covariate alongside several binary covariates of value 0 or 1, should I normalize my continuous variable to fall entirely in the range (0,1)? Dividing it by it's maximum value achieves that, which makes sense to me as a "normalization" method. Which leads to…

Question 2: When I divide each value by the maximum value to get a range of (0,1), I have a skewed distribution with most "normalized" values falling in the range (0.5,0.8). Is that a problem? Should I transform/normalize the data further to achieve mean=0 and mean +/- three standard deviations to get to 1 / 0?

Best Answer

With plain vanilla logistic regression, you don't really need to normalize. There are also no assumptions made about the distribution of the covariates, only about the error, so you don't need to worry about covariate skew either.

Sometimes when there are enormous scale differences across the covariates, changing the units to something sensible will help with convergence.

Related Question