Solved – Calculating standard error and attaching an error bar on ggplot2 bar chart

barplotggplot2r

Given a minimal dataset where am looking for the occurrence of a certain motif within a dataset of 500 observations. with_motif represents obervations with the specified motif and without_motif are observations without the motif.

with_motif <- 100
without_motif <- 400
dt <- data.frame(with_motif,without_motif)

The following code will plot a bar-chart using ggplot2 library,

bar_plot <- ggplot(melt(dt),aes(variable,value)) + geom_bar() + scale_x_discrete(name="with or without") + theme_bw() + opts( panel.grid.major = theme_blank(),title = "", plot.title=theme_text(size=14))

bar_plot

I would like to compute a standard error at 95% CI and attach a barchart to the plot. ggplot offers geom_errorbar() but I would be glad to know different ways for deriving the standard errors(deviation) so as to calculate the errorbar limits(CI).

Best Answer

Here's an example from the ggplot2 homepage: https://ggplot2.tidyverse.org/reference/geom_errorbarh.html as others have mentioned in the comments, you have to calculate SE on your own and append this information to the data.frame

df <- data.frame( 
  trt = factor(c(1, 1, 2, 2)), 
  resp = c(1, 5, 3, 4), 
  group = factor(c(1, 2, 1, 2)), 
  se = c(0.1, 0.3, 0.3, 0.2) 
 ) 
df2 <- df[c(1,3),]
p <- ggplot(df, aes(fill=group, y=resp, x=trt)) 
p + geom_bar(position="dodge", stat="identity") 
dodge <- position_dodge(width=0.9) 
p + geom_bar(position=dodge, stat="identity") + geom_errorbar(aes(ymax = resp + se, ymin=resp - se), position=dodge, width=0.25)

As a pointer, SE @ 95% CI usually looks something like this:

df$se <- 1.96*(sd(your_data, na.rm=T)/sqrt(your_n))

Your upper and lower CI bounds will just be df$se +/- the response (as shown in the aes() for geom_errorbar(), above)

Related Solutions

Solved – How to add standard error to plots in ggplot2 with R

The reason you're running into multiple methods is because the target variability to visualize in a repeated measures design is not necessarily that straightforward to determine.

If you calculate the conventional SE then what you've done is give an estimate of how well you calculated the raw score. However, generally in a repeated measures design that wasn't the goal of the study. What you are typically looking to do is to calculate an effect. The variability of that effect is much less. I generally recommend plotting your effects only and the variability of your effect estimates (better as confidence intervals than SEs). Then the error bar will represent something about what you actually attempted to study. The effect SE will be the sqrt(MSe/n) where n is the number of measurements of the effect (not to be confused with number of S's).

Solved – Alternative visualizations to 3D bar chart

One candidate is the dot chart ably and energetically promoted by W.S. Cleveland. Here's a Stata implementation:

Key points include

There is no absolute reason for lines to start at zero. Here it seems natural; in other cases it can seem superfluous.
Solid markers here draw attention to magnitudes. Whenever points might occlude or obscure each other, open markers may be better.
It's arbitrary which one categorical control nests inside another. Here treatments A B C D occur on the inside, which was found to show a simpler pattern. Another design has all treatments on the same line.

For other ideas and examples, see

Graph for relationship between two ordinal variables

Chart for visualizing multi-dimensional data

How to add a third variable to a bar plot?

Is there a better way than side-by-side barplots to compare binned data from different series

How to best visualize differences in many proportions across three groups?

In this case, there is a small functional difference between this display and similar bar charts, whether vertical or horizontal. The advantages of dot charts are more striking when each line contains two or more "dots" (more generally, markers or point symbols). Some of these threads above are especially pertinent here.

Note: Implemented in Stata with code

graph dot (asis) y, over(treatment) over(x) scheme(s1color) linetype(line) lines(lc(gs12) lw(vthin))

EDIT: Regardless of whether these are real data, a further possibility is just to shuffle the individuals 1, 2, 3. Unless you tell us otherwise, their identifiers are arbitrary; in terms of their response patterns 3 might be better placed between 1 and 2.

Best Answer

Related Solutions

Solved – How to add standard error to plots in ggplot2 with R

Solved – Alternative visualizations to 3D bar chart

Related Question