Solved – How to compare sample means in this experimental-control group study

nonparametrict-test

I am hoping to compare the sample mean of an experimental group to the sample mean of three control groups.

Here is some context for the study:

The experimental group consists of ~60 persons with a particular criminal record who were afforded various rehabilitative services. For each person, I have a count of their number of rearrests after they began their "treatment."

The control groups are similarly sized (50-70 people) and share similar characteristics to the experimental group. For each person in the control group, I have a count of their number of arrests after the control group began their treatment (i.e. control group is monitored over the same time period). I now want to test if there is a statistically significant difference between the number of rearrests for the experimental group and the control groups.

The distribution of arrests for the experimental group and the control groups is non-normal and skewed to the right. Most people either had zero rearrests or 1 rearrest while fewer and fewer people had >1 rearrest. I did however test for homogeneity of variances, which reflected that the variances are equal.

What test do I use in this case?

Also, what tests do I use under the following assumptions?

  1. Normal Distribution = True and Homogeneity of Variance = True

  2. Normal Distribution = True and Homogeneity of Variance = False

  3. Normal Distribution = False and Homogeneity of Variance = True

  4. Normal Distribution = False and Homogeneity of Variance = False

Best Answer

Not all tests of variance homogeneity across groups are equal: the Brown–Forsythe test would probably be better than Levene's test given your dependent variable's distribution. It sounds like your outcome is a zero-inflated count variable.

I'm thinking the ideal choice is a zero-inflated negative binomial or quasi-Poisson regression with the experimental group as your reference group for dummy coding purposes (i.e., three dummy variables for your control groups), but there may be more robust options for coping with heteroskedasticity. When all assumptions are true, ANOVA works, but generalized linear models and nonparametric estimators are better for non-normal error distributions. Weighted least squares can help with heteroskedastic groups, but requires a lot of data. Diagonally weighted least squares is somewhat more forgiving. Zero-inflated models also require more power though – see the following references. The second discusses iteratively weighted least squares and compares negative binomial and quasi-Poisson regression.


References
· Williamson, J. M., Lin, H., Lyles, R. H., & Hightower, A. W. (2007). Power calculations for ZIP and ZINB models. Journal of Data Science, 5(4), 519–534. Retrieved from http://www.jds-online.com/file_download/150/JDS-360.pdf.
· Ver Hoef, J. M., & Boveng, P. L. (2007). Quasi-Poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology, 88(11), 2766–2772. Retrieved from http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1141&context=usdeptcommercepub.

Related Question