Solved – How to test equality of variances with circular data

circular statisticsf-testMATLABvariance

I am interested in comparing the amount of variability within 8 different samples (each from a different population). I am aware that this can be done by several methods with ratio data: F-test equality of variance, Levene test, etc.

However, my data is circular/directional (i.e. data that exhibit periodicity such as wind direction and in general angular data, or time of the day). I have done some research and found one test in the "CircStats" package in R – "Watson's test for homogeneity". One shortcoming is that this test only compares two samples, which means I would have to do multiple comparisons on my 8 samples (and then use the Bonferonni correction).

Here are my questions:

1) Is there a better test that I can use?
2) If not, what are the assumptions of Watson's test? Is it parametric/non-parametric?
3) What is the algorithm by which I can perform this test? My data is in Matlab, and I would prefer to not have to transfer it into R to run my test. I'd rather just write my own function.

Best Answer

1) The Watson-Williams test is appropriate here.

2) It is parametric, and assumes a Von-Mises distribution. The second assumption is that each group has a common concentration parameter. I do not recall how robust the test is to violations of that assumption.

3) I have been using an implementation of the Watson test in a circular statistics toolbox, written for Matlab and available on the file exchange (link below). I have not tried, but I believe the Watson test (circ_wwtest.m) is set up for multiple groups.

https://www.mathworks.com/matlabcentral/fileexchange/10676-circular-statistics-toolbox--directional-statistics-

Related Solutions

Hypothesis Testing – Conducting an F Test for Equality of Variances

There appears to be a difference in the interpretation of a statistical formula. One quick, simple, and compelling way to resolve such differences is to simulate the situation. Here, you have noted there will be a difference when the players play different numbers of games. Let's therefore retain every aspect of the question but change the number of games played by the second player. We will run a large number ($10^5$) of iterations, collecting the two versions of the $F$ statistic in each case, and draw histograms of their results. Overplotting these histograms with the $F$ distribution ought to determine, without any further debate, which formula (if any!) is correct.

Here is R code to do this. It takes only a couple of seconds to execute.

s <- sqrt((9 * 17312 + 9*13208) / (9 + 9))             # Common SD
m <- 375                                               # Common mean
n.sim <- 10^5                                          # Number of iterations
n1 <- 10                                               # Games played by player 1
n2 <- 3                                                # Games played by player 2
x <- matrix(rnorm(n1*n.sim, mean=m, sd=s), ncol=n.sim) # Player 1's results
y <- matrix(rnorm(n2*n.sim, mean=m, sd=s), ncol=n.sim) # Player 2's results
F.sim <- apply(x, 2, var) / apply(y, 2, var)           # S1^2/S2^2

par(mfrow=c(1,2))                                      # Show both histograms
#
# On the left: histogram of the S1^2/S2^2 results.
#
hist(log(F.sim), probability=TRUE, breaks=50, main="S1^2/S2^2")
curve(df(exp(x),n1-1,n2-1)*exp(x), add=TRUE, from=log(min(F.sim)),
   to=log(max(F.sim)), col="Red", lwd=2)
#
# On the right: histogram of the (S1^2/(n1-1)) / (S2^2/(n2-1)) results.
#
F.sim2 <- F.sim * (n2-1) / (n1-1)
hist(log(F.sim2), probability=TRUE, breaks=50, main="(S1^2/[n1-1])/(S2^2/[n2-1])")
curve(df(exp(x),n1-1,n2-1)*exp(x), add=TRUE, from=log(min(F.sim)),
   to=log(max(F.sim)), col="Red", lwd=2)

Although it is unnecessary, this code uses the common mean ($375$) and pooled standard deviation (computed as s in the first line) for the simulation. Also of note is that the histograms are drawn on logarithmic scales, because when the numbers of games get small (n2, equal to $3$ here), the $F$ distribution can be extremely skewed.

Here is the output. Which formula actually matches the $F$ distribution (the red curve)?

(The difference in the right hand side is so dramatic that even just $100$ iterations would suffice to show its formula has serious problems. Thus in the future you probably won't need to run $10^5$ iterations; one-tenth as many will usually do fine.)

If you like, modify this to fit some of the other examples you have looked at.

Solved – F-Test for Equality of Variances with Weighted Survey Data

You can compare variances from first principles, i.e., by calculating the variance as the difference between value-squared and mean-squared.

webuse nhanes2, clear
gen bpsystol_sq = bpsystol* bpsystol
svy : mean bpsystol*, over( female )
* estimated variance in group 0
nlcom ( _b[bpsystol_sq:0] - _b[bpsystol:0]*_b[bpsystol:0])
* estimated variance in group 1
nlcom ( _b[bpsystol_sq:1] - _b[bpsystol:1]*_b[bpsystol:1])
* equality test
testnl ( _b[bpsystol_sq:1] - _b[bpsystol:1]*_b[bpsystol:1]) ///
     = ( _b[bpsystol_sq:0] - _b[bpsystol:0]*_b[bpsystol:0])

Best Answer

Related Solutions

Hypothesis Testing – Conducting an F Test for Equality of Variances

Solved – F-Test for Equality of Variances with Weighted Survey Data

Related Question