Solved – Non-parametric test of difference for zero-inflated data

nonparametriczero inflation

I have zero-inflated (~90% zeros) data which is distributed like the left-hand figure above (the right-hand figure shows how when log-transformed, the non-zero component of the distribution is approximately normal). My null hypothesis is that there is no significant difference between two sets of data which are distributed as above.

I want to know if there is an appropriate non-parametric statistical test which will tell me whether there is a significant difference between two such distributions. Preferably I would like to be able to tell whether some measure of centrality or other of one dataset is significantly higher than that of the other.

The best I can do so far is the Wilcoxon signed rank test (the data is paired), which I believe is telling me that one distribution is significantly different from another. I am unsure however, whether is appropriately addresses my hypothesis.

Best Answer

You should use the Mann-Whitney U-test if the samples are not paired. The Wilcoxon signed rank test is for paired data. I don't think that the number of zeros matter in this case.

Related Solutions

Zero-Inflated Poisson Model – Comprehensive Understanding

Criterion is based upon (informed) model comparisons. You are trying to account for over-dispersion.

Poisson var(x) ~ mu

Neg Binomial var(x) > mu

"Extra" zeros

ZIP var(x) ~ mu

ZIPB var(x) > mu
One active package that you can use is install.packages("pscl") You can then fit a number of models such as a hurdle model that uses a negative binomial for the counts and a binomial model for modeling the probability of zeros. This would be written something like:
```
fit <- hurdle(Admission ~ Temperature + Humidity), dist="negbin", data = data)

 summary (fit)
```

Note that the output will have two sets of coefficients: one for the hurdle component and one for the count data. This output also provides an estimate of the theta parameter (overdispersion) of the negative binomial

Or you may want to look at the zero-inflation model

fit1<-zeroinfl(Admissions ~ Temperature + Humidity), data = data,dist="negbin",link="logit")

These models can be examined with AIC (also compare these models to your Poisson model...) AIC(fit,fit1)

Zero-Inflation – Can Models for Non-Negative Data Predict Exact Zeros?

Note that the predicted value in a GLM is a mean.

For any distribution on non-negative values, to predict a mean of 0, its distribution would have to be entirely a spike at 0.

However, with a log-link, you're never going to fit a mean of exactly zero (since that would require $\eta$ to go to $-\infty$).

So your problem isn't a problem with the Tweedie, but far more general; you'd have exactly the same issue with the ordinary Poisson (whether zero-inflated or ordinary Poisson GLM) for example, or a binomial, a 0-1 inflated beta and indeed any other distribution on the non-negative real line.

I thought the usefulness of the Tweedie distribution comes from its ability to predict exact zeros and the continuous part.

Since predicting exact zeros isn't going to occur for any distribution over non-negative values with a log-link, your thinking on this must be mistaken.

One of its attractions is that it can model exact zeros in the data, not that the mean predictions will be 0. [Of course a fitted distribution with nonzero mean can still have a probability of being exactly zero, even though the mean must exceed 0. A suitable prediction interval could well include 0, for example.]

It matters not at all that the fitted distribution includes any substantial proportion of zeros - that doesn't make the fitted mean zero (except in the limit as you go to all zeros).

Note that if you change your link function to say an identity link, it doesn't really solve your problem -- the mean of a non-negative random variable that's not all-zeros will be positive.

Best Answer

Related Solutions

Zero-Inflated Poisson Model – Comprehensive Understanding

Zero-Inflation – Can Models for Non-Negative Data Predict Exact Zeros?

Related Question