Say we have the following data:

```
set.seed(45)
df <- data.frame(A = rnorm(2000, mean = 15, sd = 18),
B = rnorm(2000, mean = 25, sd = 17)) %>%
pivot_longer(cols = c(A, B), names_to = "group", values_to = "time") %>%
mutate(time = ifelse(time < 2, abs(time) + rnorm(1,15,7), time))
```

I would think that by doing:

```
df %>% ggplot(aes(x = group, y = time)) +
geom_jitter(width = .1, color = "pink", alpha = .2) +
stat_summary(fun = "mean", geom = "point") +
stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = .15)
```

ggplot would plot the 95% C.I. for each group. However, this is clearly not the case:

What I would have expected is something like this:

```
my_cis <- df %>%
group_by(group) %>%
summarise(mean = mean(time),
lwr = quantile(time, probs = 0.05),
upr = quantile(time, probs = 0.95))
df %>%
ggplot(aes(x = group)) +
geom_jitter(aes(y = time), width = .1, alpha = .2, color = "pink") +
geom_errorbar(aes(ymin = lwr, ymax = upr), data = my_cis, width = .13, color = "gray25") +
geom_point(aes(y = mean), data = my_cis, shape = 18, size = 2)
```

So, the question is: What is stat_summary() doing really? And, for better understanding, how can I replicate manually the errorbars from stat_summary?

## Best Answer

The first plot shows a

95% confidence intervalfor the unknown population mean based on your sample. Or in other words it's "a range for estimating an unknown parameter".The second plot is a

summary of the sample(andnota confidence interval). This interval describes where 90% of the data points are located. If you wanted the range where 95% of the data are, you have to adjust your`probs =`

argument to`0.025`

and`0.975`

.To reproduce the interval in the first plot try this: