Solved – Understanding Standard deviation in Normal Distribution

descriptive statisticsnormal distributionstandard deviationstandardization

My question is going to be very basic/ beginner level . I have trouble understanding the following :

enter image description here

A normal distribution is said to be defined by its mean and Std.deviation . My question is Shouldn't that " Standard deviation " apply to the whole of the data ? i.e , that standard deviation is how much the data differs from its mean on average .

But , why do we say "68% of the data lies within 1 standard deviation , 95% data lie within 2 standard deviations .. and so on " ?

Shouldn't 100% of the data lie within the original standard deviation of the data ? We calculate the standard deviation only from the given data , then why do we say " Only 68% of it lies within 1 standard deviation and so on " ?

Mine is a very basic question but I have trouble understanding this .Can someone please provide an intuitive explanation as to what's happening here ?

Best Answer

It may help to think of the standard deviation as a measure of central tendency. Any normal Gaussian distribution will tend to cluster towards the mean (lets assume the clustering is symmetric to the left and right of the mean).The standard deviation tells us the degree of clustering relative to the mean. For example, a mean of 5 and a standard deviation of 41 suggests little clustering while a mean of 41 and a standard deviation of 5 suggests that the data is heavily clustered around the mean.

Now a key feature of the normal Gaussian is that it is not bounded - there are no limits to the left and right of the distribution. It is always possible to have a data say 5 standard deviations away from the mean. It is increasingly unlikely though. If all data fell within 1 standard deviation, what would that tell us? It would neither suggest the presence of outliers nor provide a measure of 'clustering' around the mean.

I think you are overly focused your definition of standard deviation as "how much the data differs from its mean on average". But if we were to focus on this definition, then 100% of the data cannot fall within 1 standard deviation by the definition of an average. When we say that a car goes 50 mph on average you know that some cars are going slower and some are going faster. I know I'm horribly simplifying everything, but I'm just making a general point. I hope this rant helps a bit.