Your data are ordinal ratings, so you need some form of ordinal logistic regression. But I also gather that your data are not independent ("... 15 measures per patient..."), so that needs to be taken into account as well. Thus, the appropriate method here is a mixed effects ordinal logistic regression. In R, mixed effects OLR models can be fit with the ordinal package.
Here is a brief demonstration with your data:
prtdf = read.table(text="Protein Location Concentration
Prot1 Loc1 0
...
Prot5 Loc3 0", header=T)
There are several issues with these data. First, they are not quite balanced (which is not actually a big deal):
with(prtdf, table(Protein, Location))
# Location
# Protein Loc1 Loc2 Loc3
# Prot1 9 11 11
# Prot2 11 11 11
# Prot3 11 11 11
# Prot4 11 11 11
# Prot5 11 11 11
Crucially, they are missing a patient ID indicator. I will make one up, using the assumption that the order within each category is consistent and by patient ID (this may well be totally false in reality, so be forewarned):
prtdf$ID = c(1:9, rep(1:11, times=14))
prtdf = prtdf[,c(4,1:3)]
head(prtdf, 10)
# ID Protein Location Concentration
# 1 1 Prot1 Loc1 0
# 2 2 Prot1 Loc1 0
# 3 3 Prot1 Loc1 1
# 4 4 Prot1 Loc1 0
# 5 5 Prot1 Loc1 0
# 6 6 Prot1 Loc1 2
# 7 7 Prot1 Loc1 1
# 8 8 Prot1 Loc1 1
# 9 9 Prot1 Loc1 1
# 10 1 Prot1 Loc2 1
tail(prtdf, 12)
# ID Protein Location Concentration
# 152 11 Prot5 Loc2 0
# 153 1 Prot5 Loc3 0
# 154 2 Prot5 Loc3 0
# 155 3 Prot5 Loc3 0
# 156 4 Prot5 Loc3 0
# 157 5 Prot5 Loc3 0
# 158 6 Prot5 Loc3 0
# 159 7 Prot5 Loc3 0
# 160 8 Prot5 Loc3 0
# 161 9 Prot5 Loc3 0
# 162 10 Prot5 Loc3 0
# 163 11 Prot5 Loc3 0
Next, we need to make sure that ID
and Concentration
are appropriately categorized as factors. (Note also that you are missing any 4
's in Concentration
.)
with(prtdf, table(Concentration))
# Concentration
# 0 1 2 3 4 5
# 120 26 10 5 0 2
prtdf$Concentration = factor(prtdf$Concentration, levels=0:5, ordered=T)
prtdf$ID = factor(prtdf$ID, levels=1:11)
Now we can try to fit a model:
library(ordinal)
mod = clmm(Concentration~Protein*Location+(1|ID), data=prtdf, Hess=T, nAGQ=17)
# Warning message:
# (1) Hessian is numerically singular: parameters are not uniquely determined
# In addition: Absolute convergence criterion was met, but relative criterion was not met
That crashed. The problem seems to be that all the Concentrations
in "Prot4"
and "Prot5"
are 0:
aggregate(Concentration~Protein*Location, data=prtdf, function(x){ mean(as.numeric(x)) })
# Protein Location Concentration
# 1 Prot1 Loc1 1.666667
# 2 Prot2 Loc1 1.636364
# 3 Prot3 Loc1 1.363636
# 4 Prot4 Loc1 1.000000
# 5 Prot5 Loc1 1.000000
# 6 Prot1 Loc2 1.727273
# 7 Prot2 Loc2 2.181818
# 8 Prot3 Loc2 1.636364
# 9 Prot4 Loc2 1.000000
# 10 Prot5 Loc2 1.000000
# 11 Prot1 Loc3 1.545455
# 12 Prot2 Loc3 1.818182
# 13 Prot3 Loc3 2.000000
# 14 Prot4 Loc3 1.000000
# 15 Prot5 Loc3 1.000000
table(prtdf[prtdf$Protein%in%c("Prot4","Prot5"), "Concentration"])
# 0 1 2 3 4 5
# 66 0 0 0 0 0
We'll simply exclude those levels from the analysis:
mod2 = clmm(Concentration~Protein*Location+(1|ID), data=prtdf, Hess=T, nAGQ=7,
subset=!prtdf$Protein%in%c("Prot4","Prot5"))
summary(mod2)
# Cumulative Link Mixed Model fitted with the adaptive Gauss-Hermite
# quadrature approximation with 7 quadrature points
#
# formula: Concentration ~ Protein * Location + (1 | ID)
# data: prtdf
# subset: !prtdf$Protein %in% c("Prot4", "Prot5")
#
# link threshold nobs logLik AIC niter max.grad cond.H
# logit flexible 97 -106.48 238.97 760(2283) 1.06e-04 2.0e+02
#
# Random effects:
# Groups Name Variance Std.Dev.
# ID (Intercept) 0.5756 0.7587
# Number of groups: ID 11
#
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# ProteinProt2 -0.14342 0.85466 -0.168 0.867
# ProteinProt3 -0.96646 0.92526 -1.045 0.296
# LocationLoc2 0.03622 0.86002 0.042 0.966
# LocationLoc3 -0.17634 0.84245 -0.209 0.834
# ProteinProt2:LocationLoc2 0.96548 1.19393 0.809 0.419
# ProteinProt3:LocationLoc2 0.02491 1.31402 0.019 0.985
# ProteinProt2:LocationLoc3 0.57413 1.18357 0.485 0.628
# ProteinProt3:LocationLoc3 1.13724 1.27533 0.892 0.373
#
# Threshold coefficients:
# Estimate Std. Error z value
# 0|1 0.1591 0.6595 0.241
# 1|2 1.6771 0.6904 2.429
# 2|3 2.7789 0.7615 3.649
# 3|5 4.1442 0.9793 4.232
Now this does return a result, but because your variables are factors (or multilevel categorical variables), the individual level p-values are not of interest. You want to know the significance of the factors as a whole. In particular, I gather you may be interested in knowing if the interaction is significant. We can test that by fitting an additive model (i.e., without the interaction term) and performing a nested model test:
mod2a = clmm(Concentration~Protein+Location+(1|ID), data=prtdf, Hess=T, nAGQ=7,
subset=!prtdf$Protein%in%c("Prot4","Prot5"))
anova(mod2a, mod2)
# Likelihood ratio tests of cumulative link models:
#
# formula: link: threshold:
# mod2a Concentration ~ Protein + Location + (1 | ID) logit flexible
# mod2 Concentration ~ Protein * Location + (1 | ID) logit flexible
#
# no.par AIC logLik LR.stat df Pr(>Chisq)
# mod2a 9 233.02 -107.51
# mod2 13 238.97 -106.48 2.053 4 0.726
The interaction does not appear to be significant for these data. If you also wanted to test the variables in the additive model, that can be conveniently done like so:
drop1(mod2a, test="Chisq")
# Single term deletions
#
# Model:
# Concentration ~ Protein + Location + (1 | ID)
# Df AIC LRT Pr(>Chi)
# <none> 233.02
# Protein 2 232.44 3.4238 0.1805
# Location 2 229.74 0.7212 0.6973
They are not significant in these data either.
To provide explicit answers to your questions: Although Friedman's test is a one-way test only, ordinal logistic regression is a generalization of the Kruskal-Wallis test, and mixed effects OLR is a generalization of OLR and of Friedman's test. Bootstrapping is unlikely to help you here. Ordinal logistic regression is often called the proportional odds model.
Best Answer
A Friedman test could be used on two dependent samples (though some implementations might not allow it, perhaps).
However, note that a Friedman test ranks within blocks. With two dependent samples (i.e. paired data), ranking within the blocks (i.e. allocating either 1, or 2) should be entirely equivalent to a two-tailed sign test (allocating either 0 or 1 whose sum would give the number of positive pair-differences). The only differences would be in things like whether an asymptotic approximation of the distribution was used and in handling of ties.
One advantage of the sign test is that it makes it possible to do a one-sided (one-tailed) test while the Friedman is two-sided.
Consider this example:
Here observations 1 and 6 are paired, as are 2 and 7, and so on. In R, you can actually do the Friedman test with these two groups:
Note that this implementation uses an asymptotic chi-square approximation, so it won't give exactly the same results as a sign test (a binomial test) unless you use the corresponding normal approximation, and treat ties in the same way (and so on; for example if either uses a continuity correction but the other does not, then that would cause them to differ).
Alternatively it would make sense (especially with such small samples, even more so in data sets with ties) to compute the exact permutation distribution. In that case both the two-tailed sign test and the Friedman test should give identical p-values.