# Solved – Test if two gamma distributed populations are different

chi-squared-testgamma distributiongoodness of fithypothesis testingt-test

I have data from two populations of different sizes. Both have Gamma distributions with different shapes and scales (as estimated in R):

fitdistr(x/10000, "gamma") #186 members

shape 0.586219900 (0.050840587);
rate 0.012159695  (0.001564117);
mean(x) 44.26789

fitdistr(y/10000, "gamma") #50 members

shape  0.491757644 (0.080661189);
rate 0.006671180 (0.001709453);
mean(y)
68.15945


I want to answer the question: What is the probability that the two populations are significantly different. By different, I am referring to the mean and not to the shape and rate of the distributions. Does the average of y indicate that there is a significant change from the average of x?

Let's simulate some data based on your fits:

shape.x <- 0.586219900
rate.x <- 0.012159695
shape.y <- 0.491757644
rate.y <- 0.006671180
set.seed(1) # for replicability
x <- rgamma(186,shape=shape.x,rate=rate.x)
y <- rgamma(50,shape=shape.y,rate=rate.y)


Now, if you are only interested in whether the means of the underlying distributions differ, the canonical approach is a standard t test on the data themselves, without fitting anything:

t.test(x,y)

Welch Two Sample t-test

data:  x and y
t = -2.8045, df = 60.679, p-value = 0.006761
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-62.94225 -10.54156
sample estimates:
mean of x mean of y
46.81222  83.55412


Note that this is (asymptotically) valid. t tests make pretty weak assumptions on the shape of the distributions of the underlying data. This is because by standard theorems, means of samples (which we are interested in here) are asymptotically normally distributed. You have sample sizes of 186 and 50. In such situations, I'll usually use a t test without a second thought.

That said, your estimated rate parameters indicate that you have some pretty degenerate gammas (if, as @Glen_b suggests, they are gamma at all):

require(beeswarm)
par(mfrow=c(1,2))
beeswarm(x,pch=19,ylim=range(c(x,y)),main="x"); abline(h=mean(x))
beeswarm(y,pch=19,ylim=range(c(x,y)),main="y"); abline(h=mean(y))


foo.x <- seq(min(x),max(x),by=.1)
plot(foo.x,dgamma(foo.x,shape=shape.x,rate=rate.x),type="l",xlab="",ylab="",main="x")
foo.y <- seq(min(y),max(y),by=.1)
plot(foo.y,dgamma(foo.y,shape=shape.y,rate=rate.y),type="l",xlab="",ylab="",main="y")


(Compare what Wikipedia authors think is a "standard gamma".)

So the question pops up whether we have enough samples for asymptotics to kick in and make the t test valid, after all. One possibility would be to nonparametrically bootstrap the difference in means:

n.boot <- 10000
foo <- rbind(data.frame(X=x,group="A"),data.frame(X=y,group="B"))
require(boot)
set.seed(1) # for replicability
boot.nonparam <- boot(foo,
statistic=function(data,indices)diff(by(data[indices,"X"],data[indices,"group"],mean)),
R=n.boot)
ecdf(boot.nonparam$t)(0) [1] 0.0012  Now, I would say that$p=.0012$from the nonparametric bootstrap is close enough to$p=.007$from the t test, considering that the smaller group has 50 observations and no more. Another alternative would be a parametric bootstrap, if you are really sure (as per theory) that your data actually should be gamma. Bootstrap your data, fit gammas to the resampled data, then either resample from the fitted gammas (my approach below), or directly calculate the expectations of the fitted gammas as shape/rate: require(MASS) boot.param <- boot(foo, statistic=function(data,indices){ gamma.A <- fitdistr(data[indices,"X"][data[indices,"group"]=="A"],"gamma") resample.A <- rgamma(sum(data[indices,"group"]=="A"), shape=gamma.A$estimate["shape"],rate=gamma.A$estimate["rate"]) gamma.B <- fitdistr(data[indices,"X"][data[indices,"group"]=="B"],"gamma") resample.B <- rgamma(sum(data[indices,"group"]=="B"), shape=gamma.B$estimate["shape"],rate=gamma.B$estimate["rate"]) mean(resample.A)-mean(resample.B) },R=n.boot) ecdf(boot.param$t)(0)
[1] 0.0212


Now we only have $p=.02$, quite a bit more than above.

Bottom line: you should really look at your data closely. If you fit a gamma, it will be pretty degenerate - so degenerate, in fact, that I'd be wary of making any inferences about whether the means of these fitted (!) gammas will be significantly different. I'd most trust the nonparametric bootstrap - but even that can be unstable for heavily skewed distributions (Good mentions a threshold of $n=100$ in his 2006 book). One other alternative would be a permutation test, where you assess the null distribution of means under random permutation of group labels.

And the bottom bottom line: of course p values are not "probabilities that populations (or means) are significantly different". See here.