EDIT: SOME ADDITIONS TO CLARIFY ORIGINAL TEXT
If I remember correctly I heard some mention of standard deviation for precipitation means of sums is pretty useless due to the highly variable nature of preciptation quantities.
Let's say that climatologists have calculated standard deviations for means of sums of monthly precipitation for every month of the year for 30 years of measurements. A monthly sum equals the total amount that has fallen during that month. So a monthly sum equals one measurement in this case. So if you take the average of the month of july over 30 years you have 30 measurements. If the standard deviation of these mean values are bigger than the mean values themselves it tells us there is a relatively high spread in the dataset. This is another way of saying that the coefficient of variation is big.
But what would be considered big in this specific case? Are these sizes of the coefficient of variation normal for this type data? Lets assume here that all the coefficients of variations are above 100 %. Probably an irrelevant question in this forum.
Now when the difference between the average values of two 30-year-periods are calculated, each period introduces its own standard deviation. And the resulting standard deviation for the difference would be even bigger than the largest standard deviation between each of the normal periods. I believe this is called error propagation (please correct me if this is the wrong english terminology). If the resulting standard deviation is bigger than the difference of the mean values, it means that the difference between the mean values may be very far away from the true value. In other words a pretty "non-accurate" mean in this case right, which for certain/quite many observations would yield fictitious differences of means?
Precipitation can vary greatly in some regions of the world, for example due to large scale weather fluctuations like ENSO or other natural variaton. So perhaps 30 years is to low for averaging precipitation data due to high variability in some locations.
The World Meteorological Organization recommends averaging over periods of thirty years. And this is common practice. Of course there are weaknesses by doing so and deviations from this practice exist. For instance some claim that thirty years is too low for certain climatic parameters due to their variable nature. This kind of answers part of my own question here.
But if the precipitation data is only available for 30 years, are there any alternatives to standard deviation that would be recommended/considered more useful?
I think I have heard some mention that precipitation data from different locations may have different have different distributions. However is standard deviation only useful/make sense for normal distributions?
As a sidequestion: would the mean value be more accurate, with lower coefficient of variation if one has one million or billion years of measurements of data, even when each data point (spread) is highly variable?
EDIT 2: SHORT VERSION
If the data is not normally distributed what does coefficient of variation above 100 % tell us? What are the alternatives for detecting variation, if the alternatives are better/equally good (this is especially attractive to know about if the coefficient of variation is useless in my case)? Looking for answers which preferably are relevant to above example. Links to relevant studies are highly appreciated. Answers/research that provide intuitive examples/explanations are also highly appreciated. Of course answers to the other questions also are appreciated.
Best Answer
Standard deviation will tell you whether or not the measurements are highly variable, it's not that you use "standard deviation" to predict the weather, it's that you use standard deviation to tell you if the other value (for which the standard deviation is provided) can be relied on as a predictor.
Even that alone is no guarantee. Example: It rained on this date 100% for the past 100 years, will it rain today? Answer: There's a good chance, but if there are no clouds in the sky there's 0% chance. The standard deviation of a single value is not the certainty of a result.
A simple example is provided on J. Smith of SNU's webpage on standard deviation:
From: "Probabilistic Forecasting - A Primer" by Chuck Doswell and Harold Brooks of the National Severe Storms Laboratory Norman, Oklahoma:
All that standard deviation will tell you about "highly variable measurements" is that they are highly variable, but you knew that already; if the standard deviation is very low you can rely more, but not absolutely, on historical measurements.
Q: Mean more accurate with more data points?: Yes.
Q: Lower variation (standard deviation)?: No, not if the "data point (spread) is highly variable".
The "standard deviation" doesn't affect the accuracy of your calculation of the mean, regardless of the standard deviation you have equal mathematical skills and calculate both the mean and standard deviation equally well. It's that with a standard deviation (accurately calculated) the mean (or any other value) has less meaning when the standard deviation is large. It's a less useful predictor.
With a very low standard deviation any prediction based on a single value (for example, the mean) isn't 100% reliable.
- Understanding the difference between climatological probability and climate probability
Using the Probability Forecast Distribution Tool
Why do ENSO forecasts use probabilities?
Probabilistic Forecasting - A Primer (repeat of link given above)
- Bayesian probability
- Modern Forecasting Papers
A method for preferential selection of dates in the Schaake shuffle approach to constructing spatiotemporal forecast fields of temperature and precipitation (April 2017) by Scheuerer, Hamill, Whitin, He, and Henkel.
Probabilistic temperature forecasting based on an ensemble AR modification (6 Aug 2015), by Möller and Groß.
Spatial postprocessing of ensemble forecasts for temperature using nonhomogeneous Gaussian regression (30 June 2014), by Feldmann, Scheuerer, and Thorarinsdottir.
That should get you started, each of those papers has citation links which lead to newer papers.