Solved – Scaling data that are on different orders of magnitude for plotting

data visualization

Looking at the following dataset:

 Date        Visits   Carts      carts       Orders
                      Created   converted    Created
2011-11-11    12277     161        9          36  
2011-11-12    11871     93         5          19    
2011-11-13    13072     107        8          8     
2011-11-14    13594     112        4          34    
2011-11-15    12741     129        8          43    
2011-11-16    15491     261        16         57 
2011-11-17    13418     186        17         42    

I've been asked to plot this on a graph, using the Date has the X-Axis and the rest of the data on the Y-Axis. The problem is that the scale of the data is dramatically different. where Visits are in the thousands and Orders Created are in the low tens, the data doesn't plot well on a graph.

I was wondering what a statistician would do in this scenario, I could divide the the Visits by a 1000 and then put in the description (Visits (K)), but then I start to have the same problem with Carts Created, as they are in the hundreds and everything else is in the low tens.

What kind of thing is done in this scenario?

Best Answer

It isn't unreasonable at the onset to plot the line charts as a series of small multiples, with different scales for the Y axis but with the X axis (dates) aligned. enter image description here

I think this is a good start, as it allows one to examine the raw data, and allows for comparison of trends between different line charts. IMO you should look at the raw data first, then think about conversions or ways to normalize the charts to be comparable after you examine the raw data.

As King has already mentioned, it appears that your variables have a natural ordering based on the names and numbers, and assuming it is appropriate, I created three new variables based on the percentage converted at each state. The new variables are;

% Carts Created = Carts_Created/Visits
% Orders Created = Orders_Created/Carts_Created
% Carts Converted = Carts_Converted/Orders_Created

Making percentages is a way to bring the series closer to a common scale, but even then placing all of the lines on one chart (as below) is still difficult to visualize the series effectively. The level and variation of the orders created and carts converted series dwarfs that of the other series. You can't see any variation in the carts created series on this scale (and I suspect that is the one you are most interested in). enter image description here

So again, IMO a better way to examine this is to use different scales. Below is the Percentage chart using different scales.

enter image description here

With these graphics, there doesn't appear to me to be any real meaningful correlation to me between the series, but you do have plenty of interesting variation within each series (especially the proportion converted). What's up with 2011-11-13? You had a much lower proportion of order's created but every one of the order's created was a converted cart. Did you have any other interventions which might explain trends in either site visits or proportion or percentage carts created?

This is all just exploratory data analysis, and to take any more steps I would need more insight into the data (I hope this is a good start though). You could normalize the line charts in other ways to be able to plot them on a comparable scale, but that is a difficult task, and I think can be done as effectively choosing arbitrary scales based what is informative given the data as opposed to choosing some default normalization schemes. Another interesting application of viewing many line graphs simultaneously is horizon graphs, but that is more for viewing many different line charts at once.