I have ranked concentrations (from 0 to 5) of 5 different proteins from 3 different locations from the same patient. In total, 15 measures per patient and 61 patients, so 915 observations.
What I'd like to know is if:
- The 3 locations are the same regarding the concentration of the 5 proteins.
- Any protein is present in one specific location in higher concentration.
I think I'd need a two-way Friedman's ANOVA since I have 2 categorical variables (protein and location) and 1 ordinal variable (ranked concentration). My questions are:
- How can I run the test in R?
- Is bootstrapping my only solution?
- What about interactions?
- I have read about proportional odds. Could it help?
To make it clearer, I here is part of the data:
Protein Location Concentration
Prot1 Loc1 0
Prot1 Loc1 0
Prot1 Loc1 1
Prot1 Loc1 0
Prot1 Loc1 0
Prot1 Loc1 2
Prot1 Loc1 1
Prot1 Loc1 1
Prot1 Loc1 1
Prot1 Loc2 1
Prot1 Loc2 0
Prot1 Loc2 0
Prot1 Loc2 1
Prot1 Loc2 1
Prot1 Loc2 3
Prot1 Loc2 0
Prot1 Loc2 0
Prot1 Loc2 2
Prot1 Loc2 0
Prot1 Loc2 0
Prot1 Loc3 0
Prot1 Loc3 0
Prot1 Loc3 0
Prot1 Loc3 1
Prot1 Loc3 1
Prot1 Loc3 2
Prot1 Loc3 0
Prot1 Loc3 1
Prot1 Loc3 1
Prot1 Loc3 0
Prot1 Loc3 0
Prot2 Loc1 1
Prot2 Loc1 1
Prot2 Loc1 2
Prot2 Loc1 0
Prot2 Loc1 0
Prot2 Loc1 1
Prot2 Loc1 0
Prot2 Loc1 0
Prot2 Loc1 0
Prot2 Loc1 2
Prot2 Loc1 0
Prot2 Loc2 2
Prot2 Loc2 2
Prot2 Loc2 1
Prot2 Loc2 1
Prot2 Loc2 0
Prot2 Loc2 1
Prot2 Loc2 3
Prot2 Loc2 0
Prot2 Loc2 0
Prot2 Loc2 3
Prot2 Loc2 0
Prot2 Loc3 3
Prot2 Loc3 1
Prot2 Loc3 2
Prot2 Loc3 1
Prot2 Loc3 0
Prot2 Loc3 1
Prot2 Loc3 0
Prot2 Loc3 0
Prot2 Loc3 0
Prot2 Loc3 1
Prot2 Loc3 0
Prot3 Loc1 1
Prot3 Loc1 0
Prot3 Loc1 0
Prot3 Loc1 0
Prot3 Loc1 0
Prot3 Loc1 0
Prot3 Loc1 1
Prot3 Loc1 2
Prot3 Loc1 0
Prot3 Loc1 0
Prot3 Loc1 0
Prot3 Loc2 0
Prot3 Loc2 0
Prot3 Loc2 0
Prot3 Loc2 0
Prot3 Loc2 0
Prot3 Loc2 1
Prot3 Loc2 5
Prot3 Loc2 1
Prot3 Loc2 0
Prot3 Loc2 0
Prot3 Loc2 0
Prot3 Loc3 1
Prot3 Loc3 0
Prot3 Loc3 0
Prot3 Loc3 0
Prot3 Loc3 0
Prot3 Loc3 5
Prot3 Loc3 3
Prot3 Loc3 2
Prot3 Loc3 0
Prot3 Loc3 0
Prot3 Loc3 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc1 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc2 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot4 Loc3 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc1 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc2 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Prot5 Loc3 0
Best Answer
Your data are ordinal ratings, so you need some form of ordinal logistic regression. But I also gather that your data are not independent ("... 15 measures per patient..."), so that needs to be taken into account as well. Thus, the appropriate method here is a mixed effects ordinal logistic regression. In R, mixed effects OLR models can be fit with the ordinal package.
Here is a brief demonstration with your data:
There are several issues with these data. First, they are not quite balanced (which is not actually a big deal):
Crucially, they are missing a patient ID indicator. I will make one up, using the assumption that the order within each category is consistent and by patient ID (this may well be totally false in reality, so be forewarned):
Next, we need to make sure that
ID
andConcentration
are appropriately categorized as factors. (Note also that you are missing any4
's inConcentration
.)Now we can try to fit a model:
That crashed. The problem seems to be that all the
Concentrations
in"Prot4"
and"Prot5"
are 0:We'll simply exclude those levels from the analysis:
Now this does return a result, but because your variables are factors (or multilevel categorical variables), the individual level p-values are not of interest. You want to know the significance of the factors as a whole. In particular, I gather you may be interested in knowing if the interaction is significant. We can test that by fitting an additive model (i.e., without the interaction term) and performing a nested model test:
The interaction does not appear to be significant for these data. If you also wanted to test the variables in the additive model, that can be conveniently done like so:
They are not significant in these data either.
To provide explicit answers to your questions: Although Friedman's test is a one-way test only, ordinal logistic regression is a generalization of the Kruskal-Wallis test, and mixed effects OLR is a generalization of OLR and of Friedman's test. Bootstrapping is unlikely to help you here. Ordinal logistic regression is often called the proportional odds model.