Solved – How to interpret a control chart containing a majority of zero values

control chartdata visualization

I am using a control chart to try to work on some infection data, and will raise an alert if the infection is considered "out of control".

Problems arrive when I come to a set of data where most of the time points have zero infection, with only a few occasions of one to two infections, but these already exceed the control limit of the chart, and raise an alert.

How should I work on the control chart if the data set is having very few positive infection counts?

Thanks!

Best Answer

Change the variable. Run a control chart for the "time between infections" variable. That way, instead of a discrete variable with a very small range of values, you have a continuous variable with an adequate range of values. If the interval between infections gets too small, the chart will give an "out of control" indication.

This procedure was recommended by Donald Wheeler in Understanding Variation: The Key to Managing Chaos.

Related Solutions

Data Visualization – Pie Charts vs. Dot Plots: A Comparison

There are two different types of chart that that are referred to as 'dotplots' and I think that you are getting the two confused. The type of dotplot that it looks like you are thinking about is really a variation on a histogram and does not convey the same type of information that a pie chart would.

The type of dotplot from Cleveland is essentially a bar chart with a dot placed at the end of each bar, then the bar is removed. So even with millions of data points, they would be tabled the same as for creating a pie chart, then a single dot is plotted for each category. The summary preparing for the plot is the same in a pie chart and a dotplot: the difference is in a pie chart you are trying to compare non-aligned angles or areas (and the temptation to add chartjunk or otherwise distort the perception of the values is much higher) and in the dotplot you are comparing points on an aligned scale.

If you want the viewer to be able to easily judge percentage of the whole then just make sure that the axis for the dot positions goes from 0 to the total count. You can also easily add another axis (or replace the main one) that shows the percentage rather than the counts, then the percentage can be read off that axis much more accurately than estimating angles and areas in pie charts.

Here are a couple of examples using R:

This is the type of dotplot that I think you are thinking of, and this would not replace a pie chart:

library(TeachingDemos)
dots(round( rnorm(100),0 ) )

enter image description here

But this is the type of dotplot being referred to in Cleveland as a replacement for pie charts:

# steal data from ?pie
pie.sales <- c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12)
names(pie.sales) <- c("Blueberry", "Cherry",
    "Apple", "Boston Cream", "Other", "Vanilla Cream")
par(mfrow=c(2,1))
dotchart(pie.sales*100)
# or
par(xaxs='i')
dotchart( pie.sales*100, xlim=c(0,100) )

enter image description here

Solved – What’s the best (Google chart) visualisation for displaying sparse timeline data across thousands of “columns”

Answering my own question, after coming up with a set of visualisations that seems to do the job! Lesson learned: I was just trying to show too much in one chart.

In summary, the problem was solved by splitting the data into multiple charts - six or seven in all - with interactivity to enable the viewer to drill down into the data from high-level aggregate summaries.

A column chart of total alerts per month
A column chart of total alerts per day, within a month
A histogram of total alerts per device, all-time
A histogram of total alerts per device, within a month
A histogram of total alerts per device, within a day
A column chart showing the incidents happening on a specific unit, over a timeline e.g. a month.
A line-chart showing the data associated with each alert (specific to the alert, e.g. voltage, bpm, i/o whatever).

So the user starts at the top, and clicks a day (probably today). The month and day histogram reloads and the highest bar is the device with the most problems, so the user clicks that. Below, the device-specific graphs load allowing the user to see that devices behaviour.

It works well, in two/three clicks the user can get overview and detail for the most important aspects.

In this example one device had a problem on a specific day, contributing to most of the errors that day. It's very easy to find this now.

enter image description here

Best Answer

Related Solutions

Data Visualization – Pie Charts vs. Dot Plots: A Comparison

Solved – What’s the best (Google chart) visualisation for displaying sparse timeline data across thousands of “columns”

Related Question