Standard deviation is a measure that can be calculated on any set of data regardless of its actual distribution. It is simply a measure of value dispersion in relation to the data set's mean.
Any normality assumption to which you are referring usually is only a concern when doing statistical inference. For instance, if you need to test whether a sample standard deviation is 'large' or if two sample standard deviations are the same, then the underlying distribution of the data is important.
So, if all you need for your analysis is to do is compute the standard deviation, then the standard deviation formula you have been using is sufficient.
My intuition is that the standard deviation is: a measure of spread of the data.
You have a good point that whether it is wide, or tight depends on what our underlying assumption is for the distribution of the data.
Caveat: A measure of spread is most helpful when the distribution of your data is symmetric around the mean and has a variance relatively close to that of the Normal distribution.
(This means that it is approximately Normal.)
In the case where data is approximately Normal, the standard deviation has a canonical interpretation:
- Region: Sample mean +/- 1 standard deviation, contains roughly 68% of the data
- Region: Sample mean +/- 2 standard deviation, contains roughly 95% of the data
- Region: Sample mean +/- 3 standard deviation, contains roughly 99% of the data
(see first graphic in Wiki)
This means that if we know the population mean is 5 and the standard deviation is 2.83 and we assume the distribution is approximately Normal, I would tell you that I am reasonably certain that if we make (a great) many observations, only 5% will be smaller than 0.4 = 5 - 2*2.3 or bigger than 9.6 = 5 + 2*2.3.
Notice what is the impact of standard deviation on our confidence interval? (the more spread, the more uncertainty)
Furthermore, in the general case where the data is not even approximately normal, but still symmetrical, you know that there exist some $\alpha$ for which:
- Region: Sample mean +/- $\alpha$ standard deviation, contains roughly 95% of the data
You can either learn the $\alpha$ from a sub-sample, or assume $\alpha=2$ and this gives you often a good rule of thumb for calculating in your head what future observations to expect, or which of the new observations can be considered as outliers. (keep the caveat in mind though!)
I don't see how you are supposed to interpret it. Does 2.83 mean the values are spread very wide or are they all tightly clustered around the mean...
I guess every question asking "wide or tight", should also contain: "in relation to what?". One suggestion might be to use a well-known distribution as reference. Depending on the context it might be useful to think about: "Is it much wider, or tighter than a Normal/Poisson?".
EDIT:
Based on a useful hint in the comments, one more aspect about standard deviation as a distance measure.
Yet another intuition of the usefulness of the standard deviation $s_N$ is that it is a distance measure between the sample data $x_1,… , x_N$ and its mean $\bar{x}$:
$s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2}$
As a comparison, the mean squared error (MSE), one of the most popular error measures in statistics, is defined as:
$\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(\hat{Y_i} - Y_i)^2$
The questions can be raised why the above distance function? Why squared distances, and not absolute distances for example? And why are we taking the square root?
Having quadratic distance, or error, functions has the advantage that we can both differentiate and easily minimise them. As far as the square root is concerned, it adds towards interpretability as it converts the error back to the scale of our observed data.
Best Answer
Both answer how far your values are spread around the mean of the observations.
An observation that is 1 under the mean is equally "far" from the mean as a value that is 1 above the mean. Hence you should neglect the sign of the deviation. This can be done in two ways:
Calculate the absolute value of the deviations and sum these.
Square the deviations and sum these squares. Due to the square, you give more weight to high deviations, and hence the sum of these squares will be different from the sum of the means.
After calculating the "sum of absolute deviations" or the "square root of the sum of squared deviations", you average them to get the "mean deviation" and the "standard deviation" respectively.
The mean deviation is rarely used.