Solved – What does R do with negative values in log() scale

data transformationoptimal-scalingr

Some context of the problem:

I am working on an analysis of some hypothetical donation data: I would like to investigate the differences between 'major donors' (those whose largest single donation is over \$10K) and 'regular donors' (those whose largest single donation is less than \$10K).

Specifically, I am looking at whether there is a difference in the percentage change (positive or negative) from a donor's first gift to their next. (I am excluding donors who have given \$10K or over on their first donation)

Naturally, I have a large right skew of values ranging from negative values (i.e. -0.70 to 5.00). I would like to visualize the data in an informative way – so what is the best transformation/scale I should use?

I have tried plotting on a histogram of frequencies of log(percentageChange) :

enter image description here

What does R do with the negative values?

Thanks for any advice.

Best Answer

Well, you might consider editing your question to either ask which transformation to use or what's going on with that plot, but as to the latter, the log function outputs negative values for input between 0 and 1. It's only for non-positive inputs that log is undefined.

So there's nothing particularly alarming about that graph, but it's unclear from your question what it is actually displaying. Is it actually percentage change (which as you said, contains negative values)?

You might also consider splitting the data in ways that make sense, like only looking at donors whose donations decrease, in which case you could flip the sign and log to your heart's content.