Solved – Help creating a chart to show categorical data over time

categorical datadata visualizationexcelr

The scenario: Over a course of 10 minutes 2 security guards patrol a floor and have their movement tracked. They can either be on the left or the right side of the floor, and within each side of the floor are 10 zones. So in total there are 20 zones (left zone 1, right zone 1, left zone 2, right zone 2, left zone 3 right zone 3 etc etc ) As they cross into a new zone the time is recorded.

I am trying to create a chart that has time along the x axis, but I am unsure how to lay out the y axis ??

I had thought about putting the guards on the y axis and having a symbol represent each zone so I could just insert the symbol at the appropriate time as the guard moved into another zone (but that would need 20 different symbols, or just 10 symbols and use different colors for right and left zones)

e.g

 guard 1            


 guard 2   

                0   1  2  3  4  5  6  7  8  9  10
                           time 

I'm not so sure this is the best way to illustrate it. Ultimately this would be done on a large scale to see if there are any trends are evident.

I have searched google images to find a similar structured chart but could find one. I have tried to use Excel to make a similar chart but failed.

any advice as to how people would approach this ? have I got my axis muddled up perhaps ?

Best Answer

Three variables need to be visualized: time, zone, and side. We should capitalize on the two Cartesian coordinates of the plot to map two of these. Then some graphical quality--symbol, color, lightness, or orientation--will be needed to symbolize the third.

To help the eye follow the temporal sequence it can help to connect the symbols with faint line segments. We can obtain a little more information by erasing any segments that seem to correspond to breaks in the series.

The first solution uses symbol type and color to distinguish the sides, vertical position to identify zones, and horizontal position for time. It is designed to display the progress of zone and side over time to help visualize the transitions. To clarify overlaps, the symbol for side 2 is positioned above its nominal location and the symbol for side 1 is positioned below its nominal location.

Figure 1

This figure makes it immediately apparent that Guard 2 prefers the blue side (side 1) over the red (side 2) and that she moves around the zones more and in a more regular fashion.

The use of identically scaled and oriented time axes in these parallel plots enables visual comparison of the guard's patrol patterns during any time interval. With many guards, this construction lends itself well to a "small multiple" display showing all data simultaneously.

The second solution uses symbol color to distinguish times, vertical position to identify sides, and horizontal position for zone. It is a map-like display (which would generalize to a more complex spatial layout). This could be used to study the frequencies with which each space are entered by each guard and, in a more limited way, to visualize the movements among the spaces.

Figure 2

The different frequencies with which the zones and sides were visited are clearly displayed in this figure. The failure of Guard 2 to visit side 2 of zone 10 is immediately apparent, whereas it was not evident in the first figure.


These figures were produced in R. The input data structure is a list of parallel vectors for the times, zones, and sides: one per guard. The following code begins by generating some sample data randomly. Two functions to make the plot for a given guard are provided, corresponding to the figures.

#
# Side transition matrices
#
transition.side <- function(left=1/2, right=1/2) {
  rbind(c(left, 1-left), c(1-right, right))
}
#
# Zone transition matrices
#
transition.zone <- function(n, up=1/2, stay=0, down=(1-up-stay)) {
  x <- rep(c(down, stay, up, rep(0, n-2)), n)
  q <- matrix(x[-c(1, 2:n+n^2)], n)
  q <- q / apply(q, 1, sum)
  return (q)
}
n.zones <- 10
guards <- list(list(side=transition.side(1/2,1/2),
                    zone=transition.zone(n.zones,1/2,0)),
               list(side=transition.side(3/4,1/4),
                    zone=transition.zone(n.zones,1/8,3/4)))
#
# Create Markov chain walks for all guards.
#
n.steps <- 500
walks <- list()
for (g in guards) {
  zone <- integer(n.steps)
  side <- integer(n.steps)

  # Random starting location
  zone[1] <- sample.int(n.zones, 1)
  side[1] <- sample.int(2, 1)

  for (i in 2:n.steps) {
    zone[i] <- sample.int(n.zones, 1, prob=g$zone[zone[i-1],])
    side[i] <- sample.int(2, 1, prob=g$side[side[i-1],])
  }
  s <- cumsum(sample(c(rexp(n.steps-3), rexp(3, 10/n.steps))))
  walks <- c(walks, list(list(zone=zone, side=side, time=s/max(s))))
}
#
# Display a walk.
#
plot.walk <- function(walk, ...) {
  n <- length(walk$zone)
  #
  # Find outlying time differences.
  #
  d <- diff(walk$time)
  q <- quantile(d, c(1/4, 1/2, 3/4))
  threshold <- q[2] + 5 * (q[3]-q[1])
  breaks <- unique(c(which(d > threshold), n))
  #
  # Plot the data.
  #
  sym <- c(0, 19)
  col <- c("#2020d080", "#d0202080")
  plot(walk$time, walk$zone, type="n", xlab="Time", ylab="Zone", ...)
  j <- 1
  for (i in breaks) {
    lines(walk$time[j:(i-1)], walk$zone[j:(i-1)], col="#00000040")
    j <- i+1
  }
  points(walk$time, walk$zone+0.2*(walk$side-3/2), pch=sym[walk$side], col=col[walk$side],
         cex=min(1,sqrt(200/n)))
}
plot.walk2 <- function(walk, n.zones=10, n.sides=2, ...) {
  n <- length(walk$zone)
  #
  # Find outlying time differences.
  #
  d <- diff(walk$time)
  q <- quantile(d, c(1/4, 1/2, 3/4))
  threshold <- q[2] + 5 * (q[3]-q[1])
  breaks <- unique(c(which(d > threshold), n))
  #
  # Plot the reference map
  #
  col <- "#3050b0"
  plot(c(1/2, n.zones+1/2), c(1/2, n.sides+1/2), type="n", bty="n", tck=0, 
       fg="White",
       xaxp=c(1,n.zones,n.zones-1), yaxp=c(1,n.sides,n.sides-1), 
       xlab="Zone", ylab="Side", ...)
  polygon(c(1/2,n.zones+1/2,n.zones+1/2,1/2,1/2), c(1/2,1/2,n.sides+1/2,n.sides+1/2,1/2),
          border=col, col="#fafafa", lwd=2)
  for (i in 2:n.zones) lines(rep(i-1/2,2), c(1/2, n.sides+1/2), col=col)
  for (i in 2:n.sides) lines(c(1/2, n.zones+1/2), rep(i-1/2,2), col=col)
  #
  # Plot the data.
  #         
  col <- terrain.colors(n, alpha=1/2)
  x <- walk$zone + runif(n, -1/3, 1/3)
  y <- walk$side + runif(n, -1/3, 1/3)
  j <- 1
  for (i in breaks) {
    lines(x[j:(i-1)], y[j:(i-1)], col="#00000020")
    j <- i+1
  }
  points(x, y, pch=19, cex=min(1,sqrt(200/n)), col=col)
}
par(mfcol=c(length(guards), 1))
i <- 1
for (g in walks) {
  plot.walk(g, main=paste("Guard", i))
  i <- i+1
}
i <- 1
for (g in walks) {
  plot.walk2(g, main=paste("Guard", i))
  i <- i+1
}
Related Question