Solved – Log Transformation in R

data transformationlogarithmlognormal distributionnormal distribution

I need to transform my not normal distributed data to normal distributed variables. Therefore I need to log-transform them. Log10(x+1) has not worked to create a normal distribution. Therefore, I want to do a log100 transformation but it does not work in R. How do I write the function to get the new data?
Thanks a lot!!!

E. g.t one variable of my data set is "cover single". the hist() is:
enter image description here

and according to the test of normal distribution:
Shapiro-Wilk normality test

data: daten$Cover_single
W = 0.85141, p-value = 8.116e-05

with hist(log10(daten$Cover_single+1)) the following hist exists: enter image description here

Shapiro-Wilk normality test

data: log10(daten$Cover_single + 1)
W = 0.79318, p-value = 3.942e-06

So i dont get this variable into normal distribution by transformation. How can I do this in R?

Best Answer

For a linear model your predictor variables don't need to be normally distributed and your outcome variable does not not need to be distributed normally overall. What matters for standard statistical testing in a linear model is a normal distribution of residuals around the predicted values. Furthermore, much can be learned from linear modeling even if that assumption does not hold, providing ideas to deal with violations of that assumption. See this page for illustrations of why normality of the outcome variable is not needed, and this page for discussion of the ways in which normality of residuals might matter.

From your description it seems that your data could be handled by some type of multiple regression rather than by classic ANOVA. With dormouse abundance as the outcome variable, a straightforward linear regression of abundance against your untransformed predictors might work quite well. Try that, then test whether the assumption of linearity in the values of the predictors holds. Only then do you need to pay attention to the distributions of residuals and whether further transformation of your abundance values, or some type of generalized linear model, need to be considered.

Related Question