I want to calculate the ICC between 3 different measurements where the dependent variable is a count. As far as I understood, if the data were normally distributed, I would use a repeated measures ANOVA for this. But for count data, the normality assumption doesn't hold true. Do you have any hints how to get the ICC without using rmANOVA? I know that generalized linear models are somehow crucial for that, but I don't know how to exactly use them for this purpose.
Solved – Intraclass correlation with count data
anovacount-datageneralized linear modelglmmintraclass-correlation
Related Solutions
There are some nuances at play here, and they may be creating some confusion.
You state that you understand the assumptions of a logistic regression include "iid residuals... ". I would argue that this is not quite correct. We generally do say that about the General Linear Model (i.e., regression), but in that case it means that the residuals are independent of each other, with the same distribution (typically normal) having the same mean (0), and variance (i.e., constant variance: homogeneity of variance / homoscedasticity). Note however that for the Bernoulli distribution and the Binomial distribution, the variance is a function of the mean. Thus, the variance couldn't be constant, unless the covariate were perfectly unrelated to the response. That would be an assumption so restrictive as to render logistic regression worthless. I note that in the abstract of the pdf you cite, it lists the assumptions starting with "the statistical independence of the observations", which we might call i-but-not-id
(without meaning to be too cute about it).
Next, as @kjetilbhalvorsen notes in the comment above, covariate values (i.e., your independent variables) are assumed to be fixed in the Generalized Linear Model. That is, no particular distributional assumptions are made. Thus, it does not matter if they are counts or not, nor if they range from 0 to 10, from 1 to 10000, or from -3.1415927 to -2.718281828.
One thing to consider, however, as @whuber notes, if you have a small number of data that are very extreme on one of the covariate dimensions, those points could have a great deal of influence over the results of your analysis. That is, you might get a certain result only because of those points. One way to think about this is to do a kind of sensitivity analysis by fitting your model both with and without those data included. You may believe it is safer or more appropriate to drop those observations, use some form of robust statistical analysis, or to transform those covariates so as to minimize the extreme leverage those points would have. I would not characterize these considerations as "assumptions", but they are certainly important considerations in developing an appropriate model.
This is the simplest repeated measures ANOVA model if we treat it as a univariate model:
$$y_{it} = a_{i} + b_{t} + \epsilon_{it}$$
where $i$ represents each case and $t$ the times we measured them (so the data are in long form). $y_{it}$ represents the outcomes stacked one on top of the other, $a_{i}$ represents the mean of each case, $b_{t}$ represents the mean of each time point and $\epsilon_{it}$ represents the deviations of the individual measurements from the case and time point means. You can include additional between-factors as predictors in this setup.
We do not need to make distributional assumptions about $a_{i}$, as they can go into the model as fixed effects, dummy variables (contrary to what we do with linear mixed models). Same happens for the time dummies. For this model, you simply regress the outcome in long form against the person dummies and the time dummies. The effect of interest is the time dummies, the $F$-test that tests the null hypothesis that $b_{1}=...=b_{t}=0$ is the major test in the univariate repeated measures ANOVA.
What are the required assumptions for the $F$-test to behave appropriately? The one relevant to your question is:
\begin{equation} \epsilon_{it}\sim\mathcal{N}(0,\sigma)\quad\text{these errors are normally distributed and homoskedastic} \end{equation}
There are additional (more consequential) assumptions for the $F$-test to be valid, as one can see that the data are not independent of each other since the individuals repeat across rows.
If you want to treat the repeated measures ANOVA as a multivariate model, the normality assumptions may be different, and I cannot expand on them beyond what you and I have seen on Wikipedia.
Best Answer
You need to work with a Generalized Linear Mixed-effects Model (GLMM). The repeated measures ANOVA is actually a special case of the Linear Mixed-effects Model (LMM), so GLMM:LMM as GLM:LM. From there, just recognize that the ICC is a descriptive statistic that assesses how distinctive the units are relative to the total spread of the data. The standard formula is:
$$ ICC = \frac{\sigma_{\bar{x_i}}^2}{\sigma_{\bar{x_i}}^2+\sigma_\varepsilon^2} $$ In the context of a mixed-effects model, the distinctiveness of the individuals is the variance of the random effects. Since variances add, the total is the variance of the random effects plus the residual variance.
To illustrate, here is a simple example, coded in
R
: