Solved – What transformation should I use for a bimodal distribution

binomial distributiondata transformationr

I have some bimodal data like the one generated down (R language), and I don't know how to transform it to have a normal distribution or homoscedasticity. I'm running a linear discriminant analysis and I need homoscedasticity, but I'm not able to get it with this kind of distribution. Do you have an alternative to this problem?

Generating fake data

x = rnorm(100, mean = 10, sd = 2)
y = rnorm(100, mean = 20, sd = 2)
bimodal =c(x,y)
shapiro.test(bimodal)
hist(bimodal)

Transformation with Box-Cox

library(geoR)
lambda=boxcoxfit(bimodal)$lambda
bin.tr.bc=((bimodal^lambda)-1)/(lambda)

shapiro.test(bin.tr.bc)
hist(bin.tr.bc)

Log

shapiro.test(log(bimodal))
hist(log(bimodal))

Square root

shapiro.test(sqrt(bimodal))
hist(sqrt(bimodal))

Log squared

shapiro.test((log(bimodal))^2)
hist((log(bimodal))^2)

log exponent 1.5

shapiro.test((log(bimodal))^1.5)
hist((log(bimodal))^1.5)

Cube root

shapiro.test((bimodal)^(1/3))
hist((bimodal)^(1/3))

Desperate arcsin complex transformation

shapiro.test(asin((bimodal/max(bimodal))^(1/2)))
hist(asin((bimodal/max(bimodal))^(1/2)))

Best Answer

Your variable binomial is not binomial. Did you mean bimodal?

Try this:

transformed <- abs(binomial - mean(binomial))
shapiro.test(transformed)
hist(transformed)

which produces something close to a slightly censored normal distribution and (depending on your seed)

        Shapiro-Wilk normality test

data:  transformed
W = 0.98961, p-value = 0.1564

enter image description here

In general, arbitrary transformations are difficult to justify. You need a reason for doing this sort of thing, independent of the actual data