Solved – Visualising successive proportions

data visualizationrsankey-diagram

I am trying to visualise some consumer data, which has 4 categories. Users are free to switch between different categories. I would like to visualise the last three or four switches for each individual.

So we would start with a plot with a column with 4 stacked proportions. After that we would have 16 as each category breaks down into what people did on the previous occasion, then 64, and so on, until the bins become too small to be useful.

I am thinking somewhere between a marimekko chart and a stacked barchart or a dendro gram should work, but I don't even know what that would be called!

If anyone can help with the type of plot I should be using, and, if you want to be extra nice, how to implement it in R then I would be very grateful.

Best Answer

One potential idea is the use of Sankey diagrams to document the flow of choices between the categories. Two examples to describe what I am talking about are;


With an update over some of your concerns expressed in the comments. It appears to me that the Parallel Sets program does what you want out of the box. Below is an output of the program, in which I created 4 random variables with 4 categories. Whatever group you initialize to the top of the display will be sequentially divided among the subsequent categories. Creating the splitting down that you desire.

enter image description here

Also not apparent in this picture the package has some interactive functionality that allows for easier exploratory data analysis, such as when you hover over one of the categories all of it's descendants are highlighted.

I have uploaded the same dataset to Fineo which you can explore here. Besides the initial 4 category variables (named dec1 to dec4) I have also included the concatenated categories that allows you to examine the split categories. The naming convention for the variables with the exp suffix is that it is the dec variable expanded by concatenating the previous chosen categories. So dec3_exp12 would be labeled as 121 if dec1 = 1 and dec2 = 2 and dec3 = 1. You could make the same split type structure in Fineo that is available in ParSets, but it fails to render the categories with $4^3$ or more nodes in this example.

After playing around with Fineo abit more it is a neat application, but it is really limited. Parallel Sets has much more functionality, so I would suggest you check that out before the Fineo app.


I think the ParSets program is a much better option than the successively splitting the categories into subsets for examination. For an example, using the same random data as above, here is a dot plot plotting the proportion categories in decision 2 chosen conditional on the category chosen for decision 1.

enter image description here

You can do the same breakdown for the change from decision 2 to decision 3, but make a small multiple chart for what the initial decision 1 was.

enter image description here

You can continue this on infinitely (see below). It may be enlightening, but I suspect it would be fairly daunting by the time you get to many more panels. Below is as requested, visualizing 4 successive category choices.

enter image description here

As noted previously, the small numbers by the time you split your graphic into so many categories is problematic. One way to account for that is to map an aesthetic such as size to the baseline in which the proportion is based off of. This shrinks the observations based on smaller numbers from view. You could also use transparency (but I already made the points transparent to distinguish overplotted points in this example).

enter image description here

I imagine some were envisioning a Christmas tree like node structure as opposed to dot plots, but I don't know how to make such a graphic. I suspect it would be suspect to the same overwhelming problem though. These small multiples aren't bad, but IMO the Parallel Sets is alot more intuitive and I suspect some non-obvious patterns would be more apparent in that visualization. Maybe someone more imaginative than me can come up with some more interesting data than just 4 random categories.

Related Question