I am trying to learn how to fit a probability distribution to a vector of data, using the program R, but there are a lot of potential probability distributions to use! So my question is, how do I find the best distribution for my data, and how do I prove that I have picked the right distribution? Can I acquire AIC values for a whole set of different distributions?
The data are observational count data of bees visiting flowers. Each species has a certain number of visits, hence the differing frequencies. The goal is to find the best distribution to describe the bee visitation, show that I have selected the right one, and then use that distribution to sample from randomly for a set of simulations.
Here is what the data looks like, it is a vector of count observations. It is zero inflated, with a long tailed distribution (maybe zero-inflated negative binomial?).
i.vec=c(0,63,1,4,1,44,2,2,1,0,1,0,0,0,0,1,0,0,3,0,0,2,0,0,0,0,0,2,0,0,0,0,
0,0,0,0,0,0,0,0,6,1,11,1,1,0,0,0,2)
And here are some basic parameters that I have calculated. I am using standard deviation for sigma, and phi is the proportion of zeroes in the data.
m=mean(i.vec)
#[1] 3.040816
sig=sd(i.vec)
#[1] 10.86078
tab<-table(i.vec)
zero.prop<-as.numeric(tab[1])/sum(as.numeric(tab))
#[1] 0.6122449
As you can see, the standard deviation is much greater than the mean, and I have a very high proportion of zeroes.
Best Answer
You can use Vuong test in pscl package to compare non-nested models. Here is an example
Vuong test also suggests that the zero-inflated negative binomial provides a better fit to your data compared to the ordinary negative binomial (not shown here, but you can fit both models and compare them).