Solved – Why is the coefficient of variation not valid when using data with positive and negative values

descriptive statistics

I can't seem to find a definitive answer to my question.

My data consists of several plots with measured means varying from 0.27 to 0.57. In my case, all data values are positive, but the measurement itself is based on a ratio of reflectance values that can range from -1 to +1. The plots represent values of the NDVI, a remotely derived indicator of vegetation "productivity".

My intention was to compare the variability of values at each plot, but since each plot has a different mean, I opted for using the CV to gauge the relative dispersion of NDVI values per plot.

From what I understand, taking the CV of these plots is not kosher because each plot can have both positive and negative values. Why is it not appropriate to use the CV in such instances? What would be some viable alternatives (i.e., similar test of relative dispersion, data transformations, etc.)?

Best Answer

Think about what CV is: Ratio of standard deviation to mean. But if the variable can have positive and negative values, the mean could be very close to 0; thus, CV no longer does what it is supposed to do: That is, give a sense of how big the sd is, compared to the mean.

EDIT: In a comment, I said that if you could sensibly add a constant to the variable, CV wasn't good. Here is an example:

set.seed(239920)
x <- rnorm(100, 10, 2)
min(x)#To check that none are negative
(CVX <- sd(x)/mean(x))
x2 <- x + 10
(CVX2 <- sd(x2)/mean(x2))

x2 is simply x + 10. I think it's intuitively clear that they are equally variable; but CV is different.

A real life example of this would be if x was temperature in degrees C and x2 was temperature in degrees K (although there one could argue that K is the proper scale, since it has a defined 0).

Related Question