Solved – Similarity measure between groups

distancedistance-functionsmachine learningsimilarities

Imagine there is a group of people numbered 1-100, each of which have a few numerical attributes e.g. height, weight, age.

There is a small sub-group (A) which consists of 10 people from the bigger group, say 1-10. The group does not necessarily have anything in common in terms of the attributes.

What I would like to do is find a new sub-group (B), also 10 people among the rest of the group (11-100) that are most similar to the group (A) in terms of the attributes height, weight, age.

What approach would you take on this?

I thought of doing some euclidian distance matrix person-person, and then matching them up 1 to 1 – but is that really the way to go?

BACKSTORY: In lack of an actual experiment – group A is a set of users who did something and I want to have a group B which were similar to A but did not do that something, i.e. a "control"-group of sorts to estimate the effect of what A did.

Best Answer

The literature on hierarchical clustering deals with similarity measures between groups. The most popular measures of group similarity are perhaps the single linkage, complete linkage, and average linkage.

Single linkage defines the group distance according to the two nearest members. Complete linkage uses the two most distant members. Average linkage uses the distance between the group averages.

Related Question