As I understand your question, Friedman's test is a reasonable test to use.
You are treating the items as treatments, groups, or "subjects"†. And you are treating respondents as blocks or raters. And you have unreplicated complete block design.
But you might want to reconsider your choice of post-hoc test. If you want to compare among the items, you want to use an accepted post-hoc test for Friedman's. One common test is often called Conover test. Another is often called Nemenyi test.
These are available in software packages.
For some references, and other tests, see the descriptions for functions beginning with "frd" in this R package.
Using correlations will not make sense as a post-hoc test for how you are using the Friedman test, though they may be interesting to answer other questions.
† Sorry, it's not my terminology, but confusingly the term "subject" is used for the treatments, especially when Kendall's W, an effect size statistic for Friedman's test, is being discussed.
Summary
I share my thoughts in Details section. I think they are useful in identifying what we really want to achieve.
I think that the main problem here is that you haven't defined what a rank similarity means. Therefore, no one knows which method of measuring the difference between the ranks is better.
Effectively, this leaves us to ambiguously choose a method based on guesses.
What I really suggest is to first define a mathematical optimization objective. Only then we will be sure whether we really know what we want.
Unless we do that, really don't know what we want. We might almost know what we want, but almost knowing $\ne$ knowing.
My text in Details essentially is a step towards reaching a mathematical definition of ranks similarity. Once we nail this, we can confidently move forward to choose the best method of measuring such similarity.
Details
Based on one of yur comments:
- "The objective is to see if the two groups rankings differ", Peter Flom.
To answer this while strictly interpreting the objective:
- The ranks are different if, any item $i \in \{1,2,\ldots,25\}$, there exists $i$ such that $a_i \ne b_i$, where $a_i$ is the rank of of item $i$ by group $a$ and $b_i$ is the rank of the same item but by group $b$.
- Else, the ranks are not different.
But I don't think that you really want that strict interpretation. Therefore, I think what you really meant to say is:
- How different are the ranks of groups $a$ and $b$?
One solution here is simply to measure the minimum edit distance. I.e. what are the minimum number of edits that need to be performed on the ranked list of group $a$ such that it becomes identical to that of group $b$.
An edit could be defined as swapping two items, and costs costs $n$ points depending how many hops are needed. So if item $1$ needs to be swapped with item $3$ (in order to achieve identical ranks between those of groups $a$ and $b$), then the cost for this edit is $3$.
But is this method suitable? To answer this, let's look at it a bit deeper:
It's not normalized. If we say that the distance between ranks of groups $a,b$ is $3$, while the distance between the ranks of groups $c,d$ is $123$, it doesn't necessarily mean that $a,b$ are more similar each other than $c,d$ are to each other (it could also possibly mean that $c,d$ were ranking a much larger set of items).
It assumes that the cost of each edit is linear with respect to number of hops. Is this true for our application domain? Could it be that a logistic relationship is more suitable? Or an exponential one?
It assumes that all items are equally important. E.g. disagreement in ranking item (say) $1$ is treated identically to the disagreement in ranking item (say) $5$. Is this true in your domain? For example, if we are ranking books, is disagreeing on ranking of a famous book such as a TAOCP one, equally important to disagreeing on the ranking of a terrible book such as TAOUP?
Once we address the points above, and reach a suitable measure of similarity between two ranks, we will then need to ask more interesting questions, such as:
- What is the probability of observing such differences, or more extreme differences, if the difference between the groups $a$ and $b$ was only due to random chance?
Best Answer
I've done something similar for visualizing similar rankings. The method I used gave me a quick snapshot of how the rankings related-nothing more. My solution used Excel 2010 sparklines to create a small-multiples view of the rankings (this can be done in other Excel versions, but it takes a bit more work). Also, I generally use Excel's Table functionality just to speed things along.
There's nothing particularly analytical about this approach, but it's a pretty quick way to visualize the data.
Note, if you don't have Excel 2010, you can create stripped down, cell sized column charts for each row that look about the same. Or, you can use a third-party add-on to create them.
EDIT: Table and Chart utilizing Gung's suggestion for average measure. As pointed out, since the scale is similar, it was added to the chart as an additional point of comparison (I used gray to help differeniate it from the raw data plots).