Solved – Alternatives to using Coefficient of Variation to summarize a set of parameter distributions

data visualizationdescriptive statisticsdistributionsl-momentswinsorizing

Background

I have a model with 17 parameters, and I currently use the coefficient of variation ($\text{CV}=\sigma/\mu$) to summarize the prior and posterior distributions of each parameter.

All of the parameters are > 0. I would also like to summarize these pdfs on a normalized scale (in this case standard deviation normalized by the mean) so that they can be compared to each other, and with other statistics presented in similar adjacent plots (sensitivity, explained variance). I will include density plots for each parameter separately, but I would like to summarize them here.

However, the sensitivity of the CV to $\mu$ causes the following confusion that, although easily explained in text, would be preferable to avoid.

  1. the posterior CV of one parameter is greater than the prior because the mean has decreased more than the variance (parameter O in figure).
  2. one of the parameters (N) is in units of temperature. It has a 95% prior CI of (8,12 Celsius $\simeq$ 281-285K); when I present the data in units of Kelvin which is only defined for positive values, the CV is <1%, if presented as C, the CV is closer to 40%. To me, it seems that neither of these CVs provides an intuitive representation of the CI.

Question

Are there better ways to present this information, either as a CV or as another statistic?

Figure

As an example, this is the type of plot that I am planning to present, with posterior CV in black and prior CV in grey. For scale, the CV of parameter O is 1.6.

alt text

Best Answer

It seems to me that CV is inappropriate here. I think you may be better off separating the change in location from the change in dispersion. In addition, the distributions you mention in your comment to the question are, for most parameter values, skewed (positively, except for the beta distribution). That makes me question whether the standard deviation is the best choice for a measure of dispersion; perhaps the interquartile range (IQR) might be better, or possibly the median absolute deviation? Similarly, rather than the mean, I might consider the median or the mode as the measure of location. The choice might in practice be determined by ease of computation as well as the field of application, the details of the model...

Say you choose to use the IQR and the mode. You could summarise the change in dispersion using the ratio of posterior to prior IQRs, probably plotted on a log scale as that's usually appropriate for ratios. You could summarise the change in location using the ratio of the difference between posterior and prior modes to the prior IQR, or to the posterior IQR, or perhaps to the geometric mean of the posterior and prior IQRs.

These are just some quick ideas that came to mind. I can't claim any strong underpinnings for them, or even any great personal attachment to them.

Related Question