I have data for number of patients used to collect the data in a hospital for each day. I want to display them as a graphical representation of number of patients used to collect the data for each week. Which graph is more suitable for this? Histogram or Bar Graph? Please explain the pros and cons of each of them. Also if you find any graph better than them, please tell me and also justify it.
Data Visualization – Understanding the Difference Between Histograms and Bar Graphs
barplotdata visualizationdistributionshistogram
Related Solutions
There really isn't any hard upper limit, but on the other hand, in most situations, once you get all unique observations in their own bin, finer bins only serve to pinpoint their positions more precisely without conveying much more. e.g. compare these:
Except in some very particular circumstances, there's likely to be no practical benefit in the second plot, and not that much in the first. If your data are continuous, this is probably way beyond a useful number of bins.
So in most situations, that seems like at least a practical upper bound - every unique observation in its own bin.
(If there is benefit in more bins than one per unique observation, you should probably be doing a rugplot or a jittered stripchart to get that kind of information) - something like what's done in the margins of these histograms:
(Those histograms are taken from this answer, near the end)
I am going to use R. I used dput
after reading in the data to make all this reproducible. Define the data and the levels:
example <- structure(list(
V1 = structure(c(4L, 7L, 8L, 3L, 6L, 10L, 11L, 1L, 5L, 12L, 2L, 9L),
.Label = c("12.7.", "14.11.", "14.4.", "15.1.", "15.10.", "15.5.", "17.2.",
"18.3.", "22.12.", "22.6.", "24.6.", "27.10."), class = "factor"),
V2 = c(NA, NA, NA, 7L, 42L, 57L, 41L, 17L, NA, NA, NA, NA),
V3 = c(NA, NA, 22L, 71L, 135L, 175L, 139L, 103L, 29L, NA, NA, NA),
V4 = c(NA, 43L, 109L, 175L, 244L, 256L, 299L, 240L, 152L, 77L, 22L, NA),
V5 = c(95L, 165L, 245L, 300L, 374L, 375L, 400L, 375L, 299L, 200L, 95L, 45L),
V6 = c(180L, 252L, 334L, 421L, 470L, 400L, 529L, 555L, 440L, 330L, 175L, 125L),
V7 = c(237L, 325L, 495L, 500L, 540L, 535L, 626L, 616L, 557L, 440L, 225L, 189L),
V8 = c(257L, 356L, 450L, 575L, 600L, 602L, 650L, 663L, 616L, 475L, 303L, 199L),
V9 = c(245L, 355L, 455L, 550L, 597L, 602L, 657L, 678L, 643L, 499L, 357L, 232L),
V10 = c(259L, 401L, 500L, 521L, 576L, 575L, 655L, 645L, 375L, 400L, 295L, 218L),
V11 = c(222L, 295L, 375L, 495L, 527L, 579L, 599L, 585L, 518L, 400L, 245L, 175L),
V12 = c(157L, 230L, 313L, 398L, 415L, 425L, 517L, 481L, 400L, 310L, 166L, 120L),
V13 = c(67L, 121L, 195L, 255L, 299L, 305L, 382L, 332L, 275L, 99L, 65L, 21L),
V14 = c(NA, NA, 89L, 109L, 208L, 265L, 225L, 201L, 118L, 43L, NA, NA),
V15 = c(NA, NA, NA, 48L, 108L, 121L, 118L, 70L, 12L, NA, NA, NA),
V16 = c(NA, NA, NA, NA, 22L, 39L, 21L, NA, NA, NA, NA, NA)),
.Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8",
"V9", "V10", "V11", "V12", "V13", "V14", "V15", "V16"),
class = "data.frame",
row.names = c(NA, -12L))
example.levels <- c(115,170,250,330,385,600)
Then we plot twelve subplots. In each subplot, we add your levels as horizontal lines. Note that I am constraining the $y$ axis to be identical across plots so we can visually compare them:
opar <- par(mfrow=c(3,4),mai=c(.2,.3,.3,.1)+.02)
for ( ii in 1:12 ) {
plot(1:15,as.numeric(example[ii,-1]),xlab="",ylab="",
xaxt="n",main=example[ii,1],ylim=c(0,700),type="o")
abline(h=example.levels,col="grey")
}
par(opar)
I am not putting the times on the x axis since they will be hard to read anyway, but perhaps one could truncate the minutes and just note the hours. Result:
Best Answer
You should use bar chart. Histograms are used for continuous variables. You want to show the number of patients per week. Week is a discrete variable, hence you would use a bar chart. Your x variable would be the week number: week1, week2, and so on. Your y variable will be the number of patients for that week.
Which graph is better depends on what you want to convey in your visualization, assuming the graph is suitable for the type of the variable. For example, let's say you want to convey the number of patients last 3 weeks. In this case, you can use a line chart or a bar chart. If you want to convey a trend, then perhaps a line chart would be better.
For more info on histogram: https://en.wikipedia.org/wiki/Histogram