Data Transformation – Converting Poisson Distribution to Normal Distribution

data transformationnormal distributionpoisson distribution

I primarily have a computer science background but now I am trying to teach myself basic stats. I have some data which I think has a Poisson distribution

enter image description here

I have two questions:

  1. Is this a Poisson distribution?
  2. Secondly, is it possible to convert this into a normal distribution?

Any help would be appreciated. Thanks much

Best Answer

1) What's depicted appears to be (grouped) continuous data drawn as a bar chart.

You can quite safely conclude that it is not a Poisson distribution.

A Poisson random variable takes values 0, 1, 2, ... and has highest peak at 0 only when the mean is less than 1. It's used for count data; if you drew similar chart of of Poisson data, it could look like the plots below:

$\hspace{1.5cm}$enter image description here

The first is a Poisson that shows similar skewness to yours. You can see its mean is quite small (around 0.6).

The second is a Poisson that has mean similar (at a very rough guess) to yours. As you see, it looks pretty symmetric.

You can have the skewness or the large mean, but not both at the same time.

2) (i) You cannot make discrete data normal --

With the grouped data, using any monotonic-increasing transformation, you'll move all values in a group to the same place, so the lowest group will still have the highest peak - see the plot below. In the first plot, we move the positions of the x-values to closely match a normal cdf:

enter image description here

In the second plot, we see the probability function after the transform. We can't really achieve anything like normality because it's both discrete and skew; the big jump of the first group will remain a big jump, no matter whether you push it left or right.

(ii) Continuous skewed data might be transformed to look reasonably normal. If you have raw (ungrouped) values and they're not heavily discrete, you can possibly do something, but even then often when people seek to transform their data it's either unnecessary or their underlying problem can be solved a different (generally better) way. Sometimes transformation is a good choice, but it's usually done for not-very-good reasons.

So ... why do you want to transform it?