I wanted to clarify my understanding of central limit theorem.
Let's say that I have following experiment…
I am using Python, I have list of products, with some properties (for example color and revenue). Let's say that I want to compare mean of revenues, for products having certain colors. I know that some hypothesis tests are assuming that data are normally distributed. For my case hoverer, data are heavily left skewed – most of the products have poor revenue.
What is the correct approach?
Can I sample randomly products (based on color) and compare data using clt and than use hypothesis test, or I would be breaking some assumption and it is basically useless. It is better to use this approach or to use some test that doesn't need normally distributed data?
What about CLT in general – is it something what statisticans use for (for example) distribution normalization, or I should approach CLT as foundation, explaining why some methods like T-test work for non-normally distributed data given that my sample is large enough?
Thank you,
And please excuse my silly question
Best Answer