Solved – Mean vs. Standard deviation for data ranging between 0 and 1

mathematical-statisticsmeanstandard deviation

If e.g. 100 people can rate a subject either 0 or 1, than the dispersion (e.g. standard deviation) among the 100 raters is potentially largest for a mean of 0.5 (50 people rate 0, 50 people rate 1) while it is smallest (actually 0) at both extremes (i.e. when the mean is 0 or 1). The relation between the mean (ranging from 0 to 1) and the associated standard deviation (ranging from 0 to 0.5) follows a curve. Here some R-code to illustrate what I mean:

N <- 100
res_list <- list()
for(i in 1:(N-1)){
  N1 <- i
  N2 <- N-i
  print(N1)
  print(N2)
  x <- c(rep(0,N1),rep(1,N2))
  res_list[[i]] <- c(N1=i-1,N2=N-i,sd=sd(x),mean=mean(x))
}
res_df <- as.data.frame(do.call(rbind,res_list))

plot(res_df$mean,res_df$sd,xlab="mean",ylab="standard deviation")

Mean vs. standard deviation

Is there a mathematical function that describes exactly that relationship (independent from N)? Or is there a special term referring to this mean-vs-sd relationship for censored (0-1) data?

Best Answer

Suppose the mean of $X$ is $\mu$ and $0 \le X \le 1$. As you state, the variance is maximized if $P(X=1) = \mu$ and $P(X = 0) = 1 - \mu$. In that case the variance of $X$ is

$$\begin{align}E((X - \mu)^2) &= P(X=1)(1 - \mu)^2 + P(X=0)\mu^2 \\ &= \mu(1-\mu)^2 + (1-\mu)\mu^2 \\ &= \mu(1-\mu)\end{align}$$

The standard deviation is just the square root: $\sigma = \sqrt{\mu(1-\mu)}$.

Related Solutions

Transforming Data to Desired Mean and Standard Deviation – Comprehensive Guide

Suppose you start $\{x_i\}$ with mean $m_1$ and non-zero standard deviation $s_1$ and you want to arrive at a similar set with mean $m_2$ and standard deviation $s_2$.

Then multiplying all your values by $\frac{s_2}{s_1}$ will give a set with mean $m_1 \times \frac{s_2}{s_1}$ and standard deviation $s_2$.

Now adding $m_2 - m_1 \times \frac{s_2}{s_1}$ will give a set with mean $m_2$ and standard deviation $s_2$.

So a new set $\{y_i\}$ with $$y_i= m_2+ (x_i- m_1) \times \frac{s_2}{s_1} $$ has mean $m_2$ and standard deviation $s_2$.

You would get the same result with the three steps: translate the mean to $0$, scale to the desired standard deviation; translate to the desired mean.

Reporting the Number of Significant Digits – Best Practices

The Guide to Uncertainty in Measurement (GUM) recommends that the uncertainty be reported with no more than 2 digits and that the result be reported with the number of significant digits needed to make it consistent with the uncertainty. See Section 7.2.2 below

http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf

The following code was my attempt to implement this recommendation in R. Noe that R can be uncooperative with attempts to retain trailing zeros in output, even if they are significant.

gumr <- function(x.n,x.u) {
  z2 <- trunc(log10(x.u))+1
  z1 <- round(x.u/(10^z2),2)
  y1 <- round(x.n*10^(-z2),2)
  list(value=y1*10^z2,uncert=z1*10^z2)
}

x.val <- 8165.666
x.unc <- 338.9741
gumr(x.val,x.unc)

Best Answer

Related Solutions

Transforming Data to Desired Mean and Standard Deviation – Comprehensive Guide

Reporting the Number of Significant Digits – Best Practices

Related Question