Solved – Visualising the variance

data visualizationvariance

I work in a company that gathers work environment surveys. When a survey is done we create reports that are handed out to the managers in the company to show where they need to focus their efforts and such.

In these reports we show a comparison of the variance of the calculated group compared to the variance of a reference population. We "normalize" this score on a 0-100 scale. (The in-code calculation gives another scale from -inf to +inf, but for simplicity we say that 0 is 50, and just chop off everything over 100 and under 0.) Our biggest problem is that our customers gets very conscious of the actual number. Even though we try to tell them that a high/low number isn't necessarily bad, it just shows how your groups variance compares to the reference populations variance.

We are thinking of moving away from showing the number in the end-user report, and going for a visual representation of "high", "normal", "low" variance instead. But I can't figure out any good visual representation of it. It needs to be kind of neutral looking, but still show something… (.. i know ..)

Does anyone have any suggestions on how this could be achieved?

(Disclaimer: I am not a statistician, I'm a developer ;). I had one class of Statistics in my higher education, and that is over 5 years ago. So both the terms I use and my explanations could be totally meaningless.)

Best Answer

If the main concern is "that our customers gets very conscious of the actual number. Even though we try to tell them that a high/low number isn't necessarily bad", then I think you should formally address them by plotting the confidence intervals. Variance is a bad choice because its unit is the square of whatever you're measuring with and they are much bigger and can be potentially very misleading. Standard deviation is a better approach but that does not answer your customers' concern because just by SD itself one cannot tell if the point estimates are really different from the reference mean.

Some kind of plot modified based on a forest plot would be a better candidate. It's compact and easy to integrate with text fields (where you can show the summary statistics.) And what's more, it answers your client question head on. If they are worried that 3.5 is so much lower than 4.6, then show them statistically they are not different. (Or maybe your clients are right.)

And somewhat contrary to what you propose to do (eliminating numeric output altogether), I'd perhaps try to enrich the graph so that it shows more data. Devices like panel histogram or violin plot (see below) allows you to show the distribution of the actual data, which perhaps will give a strong visual cue that the data do spread and it's not about just one point.

enter image description here

Also, I'd recommend evaluating your score distribution for skewness or other deviation from normal distribution, and see if augmenting with some non-parametric plot like box plot would be a good idea.

Side comment: I feel that your trimming criterion is very rigid, but I wouldn't question your familiarity with the scale. Anyhow, if such a trimming scheme is being used, I feel you're also obligated to report how many of the people are trimmed. It's because the variation you're using to convince them that things are not that different can be potentially altered by how you define the trimming threshold. It'd be awkward if they find out later.