Solved – Is log transformation a proper way to reduce the weight of high vs. low values in logistic regression, and how to diagnose when the DV is binary

data transformationlogisticlogitpredictorregression

Consider the following case: I am analyzing a the effect of (among other variables) the age of a firm on a specific binary event. Theoretically my perception is that age matters, but not linearly. That is, I don't believe that the age of firms e.g. 300 years matter 15 times more than a firm 20 years old. Can a transformation be used in such case?

Mathematically a log transformation has the properties that i look for as it evens out the weight of very high vs. low values. Below is a visual example. The transformed value chart demonstrates roughly the effect I want to model on the DV in the logistic regression.

To me, intuitively it makes perfect sense, however it is difficult for me to find any examples of this particular case as all sources I have been through covers transformation for normality and linearity in OLS regression, assumptions that do not apply to logistic regression.

I hope anyone can help with this most likely simple question.

y axis: value, x: axis (hypothetical observation, not relevant)

x axis: observation number y axis: variable value

Edit: how can the best transformation be identified when the DV is binary?

When attempting to find the best transformation for a continuous IV in relation to a binary DV, how is this best done? For a continuous DV in e.g. OLS regression, this is possible visually, but when attempting this for a binary variable, it obviously becomes difficult (see below).

enter image description here

Best Answer

Use a smoother on the binary reponse to visually investigate the relationship: see Creating univariable smoothed scatterplot on logit scale using R. Taking the logit (or another link function where applicable) of the smooth allows you to assess its linearity on the appropriate scale. Note in plots like these you're looking at the marginal effect of the predictor along the abscissa & not taking into account confounding effects of other predictors.

Note that investigating transformations in this way & picking the "best" introduces an optimistic bias into model performance measures (when the model's fitted to the same data that suggested the transformation) which can't be incorporated into resampling validation procedures. An alternative approach would be to consider that the relationship between any simple transformation of "age of firm" and the logit of the response isn't the kind of thing that's going to be exactly linear, & to incorporate your uncertainty about its nature into the model-fitting process by representing "age of firm" with a polynomial or spline basis. That doesn't preclude transforming the predictor beforehand when this seems sensible given its distribution—often done to reduce the influence of observations in the right tail of a very skewed distribution.

Related Question