Solved – adding an error band on a graph when having errors in both x- and y-direction

data visualizationerrorr

I have a dataset with x and y values which are both calculated averages from three values and consequently they also have an associated error.

I have about 100 x,y points which I want to plot. Adding both horizontal and vertical error bars is a good method, only there are too many points and it doesn't look nice. In this graph (excel example below) I can see that in some zones the vertical error reaches the furthest and in some zones the horizontal error reaches furthest.

To make a nicer graph I want to use the largest error and make an error area (instead of error bars). But I am quite puzzled by the mathematics and how I should write a script or use a graphical tool to create such a graph.

I would think there should be some kind of solution with a loop going through each x and y value and looking for the min and max for each original xy point, but I couldn't get the maths right to do this.

Is there anybody familiar or sees a solution to this problem, preferably implementable in R.

Best Answer

A couple of options:

If the main objective is to de-clutter your plot, one option is to assign different colors to the x- and y-error bars and removing the crossbars at the ends of the error bars. Using ggplot you can accomplish this by setting the color aesthetic for each set of error bars individually. Also, use geom_linerange to remove the crossbars for the y-error bars. For some reason, the default behavior for geom_errorbarh is to not plot crossbars.

library(tidyverse)

# simulate some data
df <- data_frame(x = seq(0, 3, length.out = 100)) %>%
  mutate(y = 400 - 25*x^2,
         xerr = .05 + .1*x,
         yerr = .1*y,
         xmin = x - xerr,
         xmax = x + xerr,
         ymin = y - yerr,
         ymax = y + yerr)  

# plot using different colors
ggplot(df, aes(x, y)) +
  geom_linerange(aes(ymin = ymin, ymax = ymax)) +
  geom_errorbarh(aes(xmin = xmin, xmax = xmax), color = 'blue') + 
  geom_line(color = 'red')

Otherwise, if you truly want a band instead of individual error bars, could plot polygons that encompass the greatest extent of the x- and y-error of each point. In ggplot this is accomplished by defining a path around each x, y pair. Overplotting will produce the band you're trying to create.

I do have some misgivings about suggesting this, however. Presumably the error bars are some sort of confidence interval on the means of x and y. The corresponding confidence region on the joint distribution of x and y is not the rectangle spanned by the error bars but an ellipse.

# calculate path for polygons
mutate(df, group = 1:nrow(df)) %>%
  select(-x,-y, -xerr, -yerr) %>%
  gather(xtype, x, xmin, xmax) %>%
  gather(ytype, y, ymin, ymax) %>% 

  # sort rows so that we get rectangles and not bow-ties
  mutate(order = 
           ifelse(xtype == 'xmin' & ytype == 'ymin', 1,
                  ifelse(xtype == 'xmax' & ytype == 'ymin', 2,
                         ifelse(xtype == 'xmax' & ytype == 'ymax', 3, 4)
                         )
                  )
         ) %>%
  arrange(order) %>%

  # plot
  ggplot(aes(x,y)) + 
  geom_polygon(aes(group = group), fill = 'grey') + 
  geom_line(data = df, color = 'red')

Related Solutions

Solved – Plot a subset of categories on the x-axis in ggplot

I'm going to put on my mind reading hat and suggest that you simply add droplevels when you subset:

split1_data <- droplevels(subset(data,data$Loci %in% data$Loci[1:10]))

The likely cause of the "problem" is that Loci is a factor. Subsetting a factor may reduce the levels that are present, but it doesn't change the set of levels as an attribute of the factor. If this behavior of factors disturbs you, you can avoid it by using character vectors instead by default by setting options(stringsAsFactors = FALSE).

(But in the future, please note that it is in general impossible to diagnose problems like this without more detailed information about your data, say the output from str or dput. Please include such things in future questions.)

Solved – Truncating error bars in publication graph

The only place I can think of where "error bars" (better to use confidence limits and specify the confidence level) are out of control is where they should have been shown on the log scale but weren't. For example, if one is estimating hazard ratios, odds ratios, risk ratios, or fold-change, it is more appropriate to use a log scale when presenting the point estimates and confidence limits. This will also prevent wild limits from re-scaling the graph in way that obscures the region of interest.

Best Answer

Related Solutions

Solved – Plot a subset of categories on the x-axis in ggplot

Solved – Truncating error bars in publication graph

Related Question