[Math] Based on the number of samples what is the maximum value of the std deviation (sigma) that can be estimated

probabilitystatistics

Attempt 1 at a problem description

Assume 1000 observations are made of some quantity, the distribution is assumed normal but can have a fat tail.

What is the maximum value of the standard deviation (sigma) one can estimate for the quantity measured and what is the math to get there.

I know the value is about (~2.7) std deviations if 1000 observations are made but I don't know how this number is determined.

Attempt 2 at a problem description

I want to measure the std deviation of a distribution to a fairly high value (4-5 sigma) and I want to know how many observations I would need to accurately quote a value out at 4 or 5 sigma.

If anyone has any questions or has suggestions on re-phrasing please let me know.

Attempt 3 at a problem description

Assume you are taking samples from an unknown process, the distribution looks gaussian, how many samples would need to be seen to accurately state that a high sigma event was seen. I know this is the type of problem all of the cern guys solve.

Ideally I would like to see some solution as follows

Number of samples = Fn(Confidence interval, variance/pr(event))

Possible solution?
After poking around on the internet for a while I stumbled upon this description for a question asking what the sample size is for low probability events for a poisson distribution.

This was the response

N = F(C)/P where N is the number of events to be tested, F is a constant depending on the confidence level C and P is the limit probability. We test the statement that the probability of occurence is less then the limit — p < P.
For instance – for C = 95% F(C) is 2.99 so if I want to test that the probability of occurence is less then P=0.001 then we need to check
N = 2.99/0.001 = 2990 events .

If this is the right solution, should I just be looking for F(C) and the p < P limit?

p< P has to due with the the std dev desired, F(C) is related to the confidence interval

could someone comment on what F(C) would be in terms of the variables used on this page.
http://mathworld.wolfram.com/ConfidenceInterval.html

Any help is appreciated.
Thanks

Best Answer

It's not clear exactly what you are after or where the 2.7 comes into play, but I'll try an answer anyway.

Suppose you want a high probability that in your sample you observe values that are 2.7 or more standard deviations away from the mean. For a sample of 431 observations, you will have a probability > 0.95 that you see a value 2.7 or more standard deviations from the mean. To see a value at or beyond 4 standard deviations from the mean with greater than 95% probability, you will need a sample of at least 47,293. And for 5 standard deviations, the sample needs to be 5,225,389.

These sample sizes are based on assumptions that the population is normal and your sampled values are independent.

Related Question