Friedman Test – Is There a Two-Way Friedman’s Test?

friedman testrranks

I have ranked concentrations (from 0 to 5) of 5 different proteins from 3 different locations from the same patient. In total, 15 measures per patient and 61 patients, so 915 observations.

What I'd like to know is if:

  • The 3 locations are the same regarding the concentration of the 5 proteins.
  • Any protein is present in one specific location in higher concentration.

I think I'd need a two-way Friedman's ANOVA since I have 2 categorical variables (protein and location) and 1 ordinal variable (ranked concentration). My questions are:

  1. How can I run the test in R?
  2. Is bootstrapping my only solution?
  3. What about interactions?
  4. I have read about proportional odds. Could it help?

To make it clearer, I here is part of the data:

  Protein   Location Concentration
    Prot1   Loc1    0
    Prot1   Loc1    0
    Prot1   Loc1    1
    Prot1   Loc1    0
    Prot1   Loc1    0
    Prot1   Loc1    2
    Prot1   Loc1    1
    Prot1   Loc1    1
    Prot1   Loc1    1
    Prot1   Loc2    1
    Prot1   Loc2    0
    Prot1   Loc2    0
    Prot1   Loc2    1
    Prot1   Loc2    1
    Prot1   Loc2    3
    Prot1   Loc2    0
    Prot1   Loc2    0
    Prot1   Loc2    2
    Prot1   Loc2    0
    Prot1   Loc2    0
    Prot1   Loc3    0
    Prot1   Loc3    0
    Prot1   Loc3    0
    Prot1   Loc3    1
    Prot1   Loc3    1
    Prot1   Loc3    2
    Prot1   Loc3    0
    Prot1   Loc3    1
    Prot1   Loc3    1
    Prot1   Loc3    0
    Prot1   Loc3    0
    Prot2   Loc1    1
    Prot2   Loc1    1
    Prot2   Loc1    2
    Prot2   Loc1    0
    Prot2   Loc1    0
    Prot2   Loc1    1
    Prot2   Loc1    0
    Prot2   Loc1    0
    Prot2   Loc1    0
    Prot2   Loc1    2
    Prot2   Loc1    0
    Prot2   Loc2    2
    Prot2   Loc2    2
    Prot2   Loc2    1
    Prot2   Loc2    1
    Prot2   Loc2    0
    Prot2   Loc2    1
    Prot2   Loc2    3
    Prot2   Loc2    0
    Prot2   Loc2    0
    Prot2   Loc2    3
    Prot2   Loc2    0
    Prot2   Loc3    3
    Prot2   Loc3    1
    Prot2   Loc3    2
    Prot2   Loc3    1
    Prot2   Loc3    0
    Prot2   Loc3    1
    Prot2   Loc3    0
    Prot2   Loc3    0
    Prot2   Loc3    0
    Prot2   Loc3    1
    Prot2   Loc3    0
    Prot3   Loc1    1
    Prot3   Loc1    0
    Prot3   Loc1    0
    Prot3   Loc1    0
    Prot3   Loc1    0
    Prot3   Loc1    0
    Prot3   Loc1    1
    Prot3   Loc1    2
    Prot3   Loc1    0
    Prot3   Loc1    0
    Prot3   Loc1    0
    Prot3   Loc2    0
    Prot3   Loc2    0
    Prot3   Loc2    0
    Prot3   Loc2    0
    Prot3   Loc2    0
    Prot3   Loc2    1
    Prot3   Loc2    5
    Prot3   Loc2    1
    Prot3   Loc2    0
    Prot3   Loc2    0
    Prot3   Loc2    0
    Prot3   Loc3    1
    Prot3   Loc3    0
    Prot3   Loc3    0
    Prot3   Loc3    0
    Prot3   Loc3    0
    Prot3   Loc3    5
    Prot3   Loc3    3
    Prot3   Loc3    2
    Prot3   Loc3    0
    Prot3   Loc3    0
    Prot3   Loc3    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc1    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc2    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot4   Loc3    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc1    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc2    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0
    Prot5   Loc3    0

Best Answer

Your data are ordinal ratings, so you need some form of ordinal logistic regression. But I also gather that your data are not independent ("... 15 measures per patient..."), so that needs to be taken into account as well. Thus, the appropriate method here is a mixed effects ordinal logistic regression. In R, mixed effects OLR models can be fit with the ordinal package.

Here is a brief demonstration with your data:

prtdf = read.table(text="Protein   Location Concentration
                         Prot1     Loc1     0
                         ...
                         Prot5     Loc3     0", header=T)

There are several issues with these data. First, they are not quite balanced (which is not actually a big deal):

with(prtdf, table(Protein, Location))
#        Location
# Protein Loc1 Loc2 Loc3
#   Prot1    9   11   11
#   Prot2   11   11   11
#   Prot3   11   11   11
#   Prot4   11   11   11
#   Prot5   11   11   11

Crucially, they are missing a patient ID indicator. I will make one up, using the assumption that the order within each category is consistent and by patient ID (this may well be totally false in reality, so be forewarned):

prtdf$ID = c(1:9, rep(1:11, times=14))
prtdf    = prtdf[,c(4,1:3)]
head(prtdf, 10)
#    ID Protein Location Concentration
# 1   1   Prot1     Loc1             0
# 2   2   Prot1     Loc1             0
# 3   3   Prot1     Loc1             1
# 4   4   Prot1     Loc1             0
# 5   5   Prot1     Loc1             0
# 6   6   Prot1     Loc1             2
# 7   7   Prot1     Loc1             1
# 8   8   Prot1     Loc1             1
# 9   9   Prot1     Loc1             1
# 10  1   Prot1     Loc2             1
tail(prtdf, 12)
#     ID Protein Location Concentration
# 152 11   Prot5     Loc2             0
# 153  1   Prot5     Loc3             0
# 154  2   Prot5     Loc3             0
# 155  3   Prot5     Loc3             0
# 156  4   Prot5     Loc3             0
# 157  5   Prot5     Loc3             0
# 158  6   Prot5     Loc3             0
# 159  7   Prot5     Loc3             0
# 160  8   Prot5     Loc3             0
# 161  9   Prot5     Loc3             0
# 162 10   Prot5     Loc3             0
# 163 11   Prot5     Loc3             0

Next, we need to make sure that ID and Concentration are appropriately categorized as factors. (Note also that you are missing any 4's in Concentration.)

with(prtdf, table(Concentration))
# Concentration
#   0   1   2   3   4   5 
# 120  26  10   5   0   2 
prtdf$Concentration = factor(prtdf$Concentration, levels=0:5, ordered=T)
prtdf$ID            = factor(prtdf$ID,            levels=1:11)

Now we can try to fit a model:

library(ordinal)
mod = clmm(Concentration~Protein*Location+(1|ID), data=prtdf, Hess=T, nAGQ=17)
# Warning message:
#   (1) Hessian is numerically singular: parameters are not uniquely determined 
# In addition: Absolute convergence criterion was met, but relative criterion was not met

That crashed. The problem seems to be that all the Concentrations in "Prot4" and "Prot5" are 0:

aggregate(Concentration~Protein*Location, data=prtdf, function(x){ mean(as.numeric(x)) })
#    Protein Location Concentration
# 1    Prot1     Loc1      1.666667
# 2    Prot2     Loc1      1.636364
# 3    Prot3     Loc1      1.363636
# 4    Prot4     Loc1      1.000000
# 5    Prot5     Loc1      1.000000
# 6    Prot1     Loc2      1.727273
# 7    Prot2     Loc2      2.181818
# 8    Prot3     Loc2      1.636364
# 9    Prot4     Loc2      1.000000
# 10   Prot5     Loc2      1.000000
# 11   Prot1     Loc3      1.545455
# 12   Prot2     Loc3      1.818182
# 13   Prot3     Loc3      2.000000
# 14   Prot4     Loc3      1.000000
# 15   Prot5     Loc3      1.000000
table(prtdf[prtdf$Protein%in%c("Prot4","Prot5"), "Concentration"])
#  0  1  2  3  4  5 
# 66  0  0  0  0  0 

We'll simply exclude those levels from the analysis:

mod2 = clmm(Concentration~Protein*Location+(1|ID), data=prtdf, Hess=T, nAGQ=7,
            subset=!prtdf$Protein%in%c("Prot4","Prot5"))
summary(mod2)
# Cumulative Link Mixed Model fitted with the adaptive Gauss-Hermite 
# quadrature approximation with 7 quadrature points 
# 
# formula: Concentration ~ Protein * Location + (1 | ID)
# data:    prtdf
# subset:  !prtdf$Protein %in% c("Prot4", "Prot5")
# 
#  link  threshold nobs logLik  AIC    niter     max.grad cond.H
#  logit flexible  97   -106.48 238.97 760(2283) 1.06e-04 2.0e+02
# 
# Random effects:
# Groups Name        Variance Std.Dev.
# ID     (Intercept) 0.5756   0.7587  
# Number of groups:  ID 11 
# 
# Coefficients:
#                           Estimate Std. Error z value Pr(>|z|)
# ProteinProt2              -0.14342    0.85466  -0.168    0.867
# ProteinProt3              -0.96646    0.92526  -1.045    0.296
# LocationLoc2               0.03622    0.86002   0.042    0.966
# LocationLoc3              -0.17634    0.84245  -0.209    0.834
# ProteinProt2:LocationLoc2  0.96548    1.19393   0.809    0.419
# ProteinProt3:LocationLoc2  0.02491    1.31402   0.019    0.985
# ProteinProt2:LocationLoc3  0.57413    1.18357   0.485    0.628
# ProteinProt3:LocationLoc3  1.13724    1.27533   0.892    0.373
# 
# Threshold coefficients:
#     Estimate Std. Error z value
# 0|1   0.1591     0.6595   0.241
# 1|2   1.6771     0.6904   2.429
# 2|3   2.7789     0.7615   3.649
# 3|5   4.1442     0.9793   4.232

Now this does return a result, but because your variables are factors (or multilevel categorical variables), the individual level p-values are not of interest. You want to know the significance of the factors as a whole. In particular, I gather you may be interested in knowing if the interaction is significant. We can test that by fitting an additive model (i.e., without the interaction term) and performing a nested model test:

mod2a = clmm(Concentration~Protein+Location+(1|ID), data=prtdf, Hess=T, nAGQ=7,
             subset=!prtdf$Protein%in%c("Prot4","Prot5"))
anova(mod2a, mod2)
# Likelihood ratio tests of cumulative link models:
#   
#       formula:                                      link: threshold:
# mod2a Concentration ~ Protein + Location + (1 | ID) logit flexible
# mod2  Concentration ~ Protein * Location + (1 | ID) logit flexible
# 
#       no.par    AIC  logLik LR.stat df Pr(>Chisq)
# mod2a      9 233.02 -107.51                      
# mod2      13 238.97 -106.48   2.053  4      0.726

The interaction does not appear to be significant for these data. If you also wanted to test the variables in the additive model, that can be conveniently done like so:

drop1(mod2a, test="Chisq")
# Single term deletions
# 
# Model:
# Concentration ~ Protein + Location + (1 | ID)
#          Df    AIC    LRT Pr(>Chi)
# <none>      233.02                
# Protein   2 232.44 3.4238   0.1805
# Location  2 229.74 0.7212   0.6973

They are not significant in these data either.

To provide explicit answers to your questions: Although Friedman's test is a one-way test only, ordinal logistic regression is a generalization of the Kruskal-Wallis test, and mixed effects OLR is a generalization of OLR and of Friedman's test. Bootstrapping is unlikely to help you here. Ordinal logistic regression is often called the proportional odds model.

Related Question