Solved – Normal Distribution or not

normal distribution

I'm a newbie here. My question is the following.

Are the following set of values normally distributed?
26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34

The above values are from the below link
https://www.mathsisfun.com/data/standard-normal-distribution.html

They go on to compute the mean and standard deviation and the corresponding z scores assuming they are normally distributed.

However when i plotted the values on a histogram using excel, i get the following chart(Attached image) which shows a positive skewness and we know that a normally distributed set of observations has no skewness at all i.e its perfectly symmetrical.

Do we need to transform the data-set into normally distributed values before calculating the mean , standard deviation and the z scores ? …since in real world situations , data-sets may not be normally distributed , then how do we go ahead to perform statistical tests on them.enter image description here

Best Answer

For two reasons you picked the wrong kind of plot for visualizing your sample. First, you assume that your data is continuous, so there is no point in counting distinct values. Second, your sample is very small, so even with discrete numbers, in most cases you can expect small counts per value that result with a flat barplot.

Recall that for continuous random variable $\Pr(X=x)=0$, so assuming that we are talking about continuous random variable we would rather not expect different values to appear in your sample multiple times -- so counting their occurrences is misleading. That is why, for continuous random variables we use probability densities, i.e. probabilities "per foot". Instead of counting how many number each of the numbers appeared, you should count their counts in intervals. That is why for visualizing your data rather than using bar plot, you should use histogram, or density plot.

Since your sample is very small, histogram could be misleading because there is limited number of bars that can be used and small number of cases that will fall into each of the bars (no matter if your variable is discrete or continuous). In this case, density plot (see below) could be more informative.

enter image description here

As a counter-example, below you can see barplot of values generated from normal distribution using pseudo-random numbers generator (black bars) and density plot (red line).

enter image description here

As you can see, barplot would "suggest" that this perfectly normal data is almost uniformly distributed...

As about if your sample is normally distributed -- it seems that the data contains of integers rather than real numbers, so obviously it is not perfectly normal. Moreover, the distribution is skewed rather than symmetric. However in most cases this is not a problem because we are interested in approximate normality. See: Is normality testing 'essentially useless'?

Related Question