[Math] How to the standard deviation be interpreted when the range is partially impossible

standard deviationstatistics

After meassuring the answer time of a software system I calculated the standard deviation from the samples. The average time is about 200ms, the standard deviation $$\sigma = 300ms$$ According to the image below this should mean that 68.2% of all response times should be between -100ms and 500ms.

^{Image:https://en.wikipedia.org/wiki/Standard_deviation}

A negative response time makes obviously no sense. How should the part of the normal distribution be interpreted that is enclosed in the red box?

Sample data with similiar avg ~202 stddev ~337:

Best Answer

You have assumed a normal distribution for a data which cannot be negative. It doesn't make sense at all. You can use lognormal distribution instead. It is used in Black-Scholes model for pricing options. (Stock prices cannot be negative)

Obviously, I can't tell you if your sample fits such a distribution if I don't have access to the full dataset.

r-script:

require(MASS)
hist(x, freq=F)
fit<-fitdistr(x,"log-normal")$estimate
lines(dlnorm(0:max(x),fit[1],fit[2]), lwd=3)

(x is a sample vector)

Obviously, your sample is way too small here.

Best Answer

Related Solutions

[Math] When standard deviation is unknown

[Math] how to find standard deviation when given a percentage

Related Question