One of the predictors in my logistic model has been log transformed. How do you interpret the estimated coefficient of the log transformed predictor and how do you calculate the impact of that predictor on the odds ratio?

# Solved – Interpretation of log transformed predictors in logistic regression

data transformationlogisticregression

#### Related Solutions

The odds ratio will be the ratio of the odds per 1 point increase in the natural log of whatever the variable is. Say it's log income. Then the OR is for a change of income from \$10,000 to \$27,183; or \$20,000 to \$54,366 etc.

I agree strongly with the suggestion of COOLSerdash and Srikanth Guhan that Poisson regression is much more natural for your problem. So the idea of transforming the response is a much weaker solution than using an appropriate count model. Poisson regression is well documented in many good texts and in this forum. In effect you get treatment on a logarithmic scale but with smart ideas ensuring that zeros in your data don't bite. (If Poisson regression turns out to be an over-simplification, there are models beyond.)

The rest of this answer focuses on the transformation you have used and whether it is a good idea for temperature change as a predictor that can be negative, zero or positive. This is a more unusual question by far.

The transformation used here has a name, **asinh** or inverse hyperbolic sine. Its graph is nicer than its algebraic definition in the question using other functions more likely to be met in elementary mathematics:

The function is defined for all finite values of an argument, $x$ say. I've plotted it arbitrarily over the range from $x = -10$ to $10$, except that that range has your example in mind to the extent possible; we'll get to that in a moment.

The virtues of this transformation include treating positive and negative values symmetrically. The result of the transformation varies smoothly as the argument passes through zero. It will certainly pull in outliers that are large positive or large negative. It is likely to reduce skewness in many cases.

Another virtue of the transformation is that it is close to the identity transformation for values near zero. The graph shows a line of equality near the origin to underline this point. (All puns should be considered intentional.)

Against all that I would want to underline reservations about using this transformation.

First, but not least, even if people have met this transformation in a previous existence, I doubt that many people retain a good feeling for what it is and how it works. To be personal about this, I meet it in statistical practice about once every three years and have to draw the graph and think about it every time. Unless you are working in an area where it is a known trick, most readers are going to say "What's that?" in some way or another. That doesn't rule it out as a solution, but it's a practical consideration when imagining putting this in a paper or thesis.

Second, and more important, I doubt there is any biological (physical, economic, whatever) rationale to this. It's unphysical and unphysiological, I would guess wildly, to imagine that temperature change works in this way in affecting organisms. We do some things in statistics for purely statistical reasons, naturally, but it is a bonus when that makes sense scientifically, and not otherwise.

Third, something you should certainly do is plot asinh(temperature change) versus temperature change for your data and ask how much difference it really makes. As pointed out, the transformation is close to identity for small $x$: the important part of that is being close to linear for small $x$. The scatter plot equivalent of my graph for your data points may indicate that you are better off leaving the data as they are. How does the skewness arise? Is a matter of a few outliers? I will now reveal that my limits from $-10$ to $10$ arise from a(nother wild) guess that most of your temperature changes will be small, a few $^\circ$C. (What is temperature change any way? Day to day? Really big temperature changes might just kill the hens, or put them off egg laying altogether; perhaps that is part of your problem.) So: one possibility is that the transformation makes less difference than you imagine, in which case not doing it would simplify your analysis. Another possibility is that there is a real problem here which might need to be dealt with in other ways. We would need to hear more about your data to advise better.

Fourth, and perhaps most important, is that the marginal distribution of any predictor is not itself important for regression models, contrary to many myths recycled repeatedly.

Fifth, plotting residuals versus predictor is a (partial) way to see whether the version of the predictor used actually works well in your model. Pattern on this plot may indicate that you got it wrong. (By "version of", I mean temperature change, or its asinh, or some other transformation.)

I don't see much need for or merit in standardisation of predictors here.

In seeking better advice, posting your data as well makes it much easier to advise.

EDIT 3 Sept 2021 "To be personal about this" Since this was written I have seen asinh in action more and played with the definition and its derivative in relation to data. Hence I know better how and when it works, or helps. That's trivial, but also universal: If something is unfamiliar, you may have to work at it before it becomes familiar.

## Best Answer

If you exponentiate the estimated coefficient, you'll get an

odds ratio associated with a $b$-fold increasein the predictor, where $b$ is the base of the logarithm you used when log-transforming the predictor.I usually choose to take logarithms to base 2 in this situation, so I can interpet the exponentiated coefficient as an odds ratio associated with a

doublingof the predictor.