Solved – Clustering of points based on vector feature similarities in R

I have as an input a number of points that I need to partition into clusters. Each point has a number of features that are ideally to be used to find the similarity between each point and the others. Some of these features are scalar values (one number) and others are vectors.

For example, assume that each point has the following features:

S1: scalar value
V1: 48 $\times$ 1 vector
V2: 48 $\times$ 1 vector

For example one point may have (S1,V1, V2) as (100, {0, 100, 20, 30}, {75,0,10, 5})

My hypothesis is to use cosine similarity to find how similar the vector V1 or V2 of one point is to the vector V1 or V2 of another point. I have already computed the similarity matrices between all points in terms of V1 and V2 similarities.

By exploring the standard clustering algorithms in R, I have found that k-means turns to use the Euclidean distance, which might be suitable for clustering points according to their scalar values, because [subject unclear] doesn't work for the situation where I have hybrid types of features (scalars and vectors). Also the K-medoid clustering seems to be supporting only the Euclidean and the Manhattan distances.

I think what should be done is to generate one more distance/similarity matrix between all points based on the scalar value, so that we end with three similarity matrices that show the similarity between each point and the other points according to each feature regardless of it being a scalar or a vector, and use those matrices for finding the neighbourhood of points while clustering.

I wonder if there is an implementation for a clustering algorithm that accepts as an input the similarity matrices (or alternatively the dissimilarity/distance matrices) between vector features of multiple points and uses them for clustering?

Solved – Clustering of points based on vector feature similarities in R

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Clustering probability distributions – methods & metrics

Solved – Euclidean Distance b/t unit vectors or cosine similarity where vectors are document vectors

Related Question