MATLAB: Array Row Similarity/Comparison

arrayclustercomparisonrowsimilarity

I want to compare rows of two arrays to see which rows are most similar to one another, sort of like clustering. To be clear, I don't want to compare the differences between the numbers, rather the entire row as a whole. Another thing I would like to be able to do is see if particular numbers in the SLP variable occur more often when a number shows up in the same index in the 500z variable. Both variables have 20 columns and 49 rows, but in the following example variables I only included 3 rows and 5 cols.
SLP = [1,3,4,2,3
4,7,6,5,6
1,4,3,3,2]
500z= [9,6,7,6,6
7,5,7,6,8
9,7,6,6,6 ]
An example output I would like is: 1.) A measure of row similarity (perhaps a percentage of similarity or even a cluster number): The most similar rows in SLP are rows 1 and 3: therefore an example output could be a 3×3 matrix (SLP rows 1-3 going down and 500z rows 1-3 going across) with the percentage of similarity between each row. Or it could be in the form of a cluster. Ex: rows 1 and 3 belonging to cluster 1 and row 2 belonging to cluster 2. 2.) Which numbers occur most frequently in the same index between the two variables. So looking at the sample variables, I would get SLP 1 tends to occur with 500z 9. SLP 3 tends to occur with 500z 6, SLP 4 tends to occur with 500z 7, and so on. This could be output as simply as an array where column 1 is the SLP pair and column 2 is the 500z pair. It would be great to be able to have a column 3 as well saying how often the pair occurred.
I've been stuck on this for a while so any help or suggestions of how to best approach this problem would be awesome! I am also fairly new to Matlab, so my wording may not be the best.

Best Answer

One of the techniques for similarity is sum-of-squares-of-differences between the rows.
It so happens that square root of sum-of-squares-of-differences is equivalent to Euclidean distance. Therefore you can find a similarity measure by using pdist() between the rows.