Solved – Kruskal-Wallis

binomial distributionkruskal-wallis test”proportion;

I have three sets of data (1 set before treatment, second during treatment and the third after), each of which consists of hundreds of trials on a specific task that involves choosing 1 from 7 possible choices, the answer for which may be correct or not (binary categorization). I'd like to determine whether the proportion of correct choices is significantly different as a result of treatment. Which statistical test could potentially help?

Best Answer

So you have data from two subjects who completed hundreds of trials at three assesment phases (before/during/after treatment), right? Sounds like you're looking to falsify the null hypothesis that both subjects achieved equal success before and during treatment (and maybe after too?), but your data would permit more tests than this.

A mixed-effects model could test the effect of assessment phase (effectively a test of your treatment), test for differences between your two subjects, and test for a treatment × subjects interaction. If you know the order of the trials within assessment phases, you could also test for effects of practice, fatigue, or whatever else might change subjects' performance over a long series of consecutive trials. It seems desirable to statistically control any such effects of trials nested within phases, as well as to control differences between your two subjects, as this would leave less variance for your treatment to explain. You might also find evidence of an interaction interesting, as it would suggest individual differences in the effect of the treatment.

The fixed vs. random effects distinction is somewhat ambiguous (Gelman, 2005; see also "What is the difference between fixed effect, random effect and mixed effect models?"), so I should admit that I'm not sure which effect(s?) you'd want to treat as random according some definitions. I'm mainly suggesting a mixture of between-subjects and within-subjects factors in your model: $$Y_{ij}=\mu+\beta_1{\rm Subject}_i+\beta_2{\rm Time}_j+\beta_3{\rm Subject}_i{\rm Time}_j+U_i+W_j+\varepsilon_{ij}$$

  • $Y_{ij}$ is the response variable, as usual. If this is a count of successful trials for subject $i$ at time $j$, and both subjects had an equal number of trials at each time, then you could assume a negative binomial distribution (or maybe Poisson, though this involves a dispersion assumption). If your subjects had a different number of trials or you prefer to model proportions of successful trials for some other reason, you could assume a beta distribution instead. If you prefer not to worry about the distribution, you could also try a nonparametric model (e.g., Gu & Ma, 2005).
  • $\mu$ is the grand mean of the response variable across all trials/subjects/times.
  • $\beta_1$ is the mean difference between subjects across all trials/times.
  • $\beta_2$ is the average slope of changes across your three times for all trials/subjects.
  • $\beta_3$ represents the difference in slopes across times for all of your two subjects' trials.
  • $U_i$ represents subject-specific error – could be useful to control if they have different variances.
  • $W_j$ represents time-specific error. E.g., a gradual effect of the treatment could increase variance over trials during the treatment only. You might be able to estimate this as a hierarchical model since you have hundreds of trials nested within assessment times, but since they're binary trials, I'm not sure this is possible. I'm not sure this applies, but consider Hudecová (2013). See also "Autocorrelation of discrete time series".

    It might help if you could get another subject too, as this would give you three proportions per assessment phase. That's at least enough to begin separating subject-specific error from time-specific error within a single time period, I think. I may be stretching what I know about latent variable modeling a little too far here.

  • $\varepsilon_{ij}$ would be the remaining error that can't be attributed to subject-specific error (e.g., inattentiveness) or time-specific error. If you can separate out the other two kinds of error at all, this could help isolate the effect of the treatment.

If your two subjects aren't very different, including terms for differentiating them in your model might not be worth the degrees of freedom, but that's an empirical question worth testing IMO. Then again, I seem to have taken this in a somewhat different direction than @Glen_b and @Adrian indicate. Hopefully someone will speak up if I'm suggesting something incorrect or unfeasible, or if the random effects idea can be clarified.


References
· Gelman, A. (2005, January 25). Why I don’t use the term “fixed and random effects”. Statistical Modeling, Causal Inference, and Social Science. Retrieved from http://andrewgelman.com/2005/01/25/why_i_dont_use/.
· Gu, C., & Ma, P. (2005). Generalized nonparametric mixed-effect models: Computation and smoothing parameter selection. Journal of Computational and Graphical Statistics, 14(2), 485–504. Retrieved from http://www.stat.purdue.edu/~chong/ps/guma.pdf.
· Hudecová, Š. (2013). Structural changes in autoregressive models for binary time series. Journal of Statistical Planning and Inference, 143(10), 1744–1752.