If you have a 5-point Likert scale, you would want to entertain an ordinal logistic regression. The analogues of $t$-test is then a regression with dummy variable coded categorical variables (just one regressor with 0/1 values for the $t$-test, and multiple variables with many categories for ANOVA). Some low categories may have to be collapsed to identify the thresholds.
For more details, see J Scott Long's book on categorical dependent variables.
It's the square root of the second central moment, the variance. The moments are related to characteristic functions(CF), which are called characteristic for a reason that they define the probability distribution. So, if you know all moments, you know CF, hence you know the entire probability distribution.
Normal distribution's characteristic function is defined by just two moments: mean and the variance (or standard deviation). Therefore, for normal distribution the standard deviation is especially important, it's 50% of its definition in a way.
For other distributions the standard deviation is in some ways less important because they have other moments. However, for many distributions used in practice the first few moments are the largest, so they are the most important ones to know.
Now, intuitively, the mean tell you where the center of your distribution is, while the standard deviation tell you how close to this center your data is.
Since the standard deviation is in the units of the variable it's also used to scale other moments to obtain measures such as kurtosis. Kurtosis is a dimensionless metric which tells you how fat are the tails of your distribution compared to normal
Best Answer
It can be quite useful but it depends on what you're doing.
For example, I use the lognormal and gamma distributions regularly and sometimes use the standard deviation with those distributions -- it can be informative in some contexts.
No, that's true for normal populations. It's not something that holds more generally - even with symmetric distributions you can have almost nothing, or anything up to 100% within one standard deviation of the mean.
Again, it depends on the distribution - you might get more or less than 68% with positively skewed or negatively skewed distributions.
It depends on what you're trying to do. Distributions needn't be skew for the standard deviation not to be suitable, and the standard deviation may be quite okay when distributions are skew.