Solved – Comparison of two Simpson indices using t-test

diversityt-test

I would like to compare two Simpson Indices from two different populations. I have calculated their variance, as it is done in the original paper by Simpson regarding measures of diversity and I have calculated a confidence interval for each of them using the formula:

$(S-2\sqrt{\text{var}},S+2\sqrt{\text{var}})$,

as suggested in a published paper.

What I would like to find is a p-value of the null hypothesis that the two indices are equal.

I have read that someone can do a Welch t-test to compare them, but I haven't found any single paper or book with such an application.

My questions regarding this application are:

1) In Welch t-test the variances in the denominator are divided by $n_1$ and $n_2$ respectively, since it is the SE of the mean. I guess in this case and based on the formula for the CI we shouldn't divide by $n$ and have just the square root of the sum of variances. Correct?

2) The degrees of freedom for a simple t-test are $n_1+n_2-2$, while for the Welch t-test is quite a complicated formula which gives a result close but not the same as $n_1+n_2-2$. Which one should be used?

3) By $n_1$ and $n_2$ above we mean the number of different categories in each population rather than the total number in each case. Correct?

I would be very grateful if someone can help me with these questions and even more if someone can provide some sort of documentation so I can justify my analysis.

Reference:

Simpson, E. H. (1949), Measurement of diversity. Nature, 163, 688 (pdf)

Best Answer

It seems highly unlikely that a t distribution is appropriate here, given the Simpson Index is an estimate of a probability. T distributions mostly crop up when you have the mean of a sample from a normal distribution which is not the case here.

It's useful in such situations to think through "what is my null hypothesis"? In this case it is that there is a single population. So the question becomes "what is the chance that a single population, divided at random into two groups of the size of my two populations, would produce two observed values of Simpson Index as far apart as those we see here?

This suggests a hypothesis test based on monte carlo methods ie simulating draws from the big population may be a good approach.

Failing that, the paper that suggested how to create your confidence intervals seems to imply estimated Simpson Index is roughly normally distributed so you could use those variance values in a two sample test of difference. This will look like the t test you refer to but with a normal distribution rather than t.

Related Question