I am aware of Cosine Similarity which measure the angle between "two" vectors. Prototype for cosine similarity would look something like this:
float cosine_similarity(vector<float> a, vector<float> b);
Are there any similarity measures that measure the similarity between "n" vectors? Prototype of this function would look something like this:
float cosine_similarity(vector<vector<float>> vectors);
PS: I am NOT looking for clustering/classification sort of a thing.
I will appreciate if someone can point out in some direction.
Thanks!
Best Answer
The cosine similarity between two column vectors $x_1$ and $x_2$ is simply the dot product between their unit vectors $$\mathrm{CosSim}[x_1,x_2]=\frac{x_1}{\|x_1\|}\bullet\frac{x_2}{\|x_2\|}$$ and varies from -1 to +1, similar to a correlation coefficient $R$ (to which it is closely related).
There are many ways to generalize this idea to a set of $n$ vectors, but as requested in the comments, here I will expand on one possibility, related to the idea of (linear) dimensionality reduction.
First, in the $n=2$ case, we can think of $R^2\in[0,1]$ as the "fraction of the variance explained" by a linear regression. For the squared cosine similarity, a similar interpretation is possible, except the associated regression has no "intercept term" (i.e. the vectors are scaled, but not centered).
In the $n$ dimensional case, we can concatenate the vectors into a matrix $$X=\begin{bmatrix}x_1 \, \ldots \, x_n\end{bmatrix}$$ analogous to the cosine similarity case, we can consider dot products based on the scaled matrix $$Y=\begin{bmatrix}y_1 \, \ldots \, y_n\end{bmatrix}$$ assembled from the corresponding unit vectors $$y_i=\frac{x_i}{\|x_i\|}$$ In general we could then conduct varying analyses on the "correlation matrix" $Y^TY$, including dimension reduction via PCA.
The general idea here is to create a low-rank approximation to the matrix $Y$ by truncating its singular value decomposition (SVD). Mathematically, we have $$Y=USV^T=\sum_{k=1}^n\sigma_ku_kv_k^T$$ where the singular values $\sigma_1,\ldots,\sigma_n$ are non-negative and arranged in decreasing order. (For simplicity I am assuming that for $x_i\in\mathbb{R}^m$ the number of vectors is at least $n\geq m$. Otherwise the sum will have $m$ terms.) If we truncate the sum at $K<n$, then we get a "rank $K$ approximation" $\hat{Y}_K$ to the matrix $Y=\hat{Y}_n$. The truncated SVD is optimal in the sense that it minimizes the Frobenius norm of the reconstruction error $\|\hat{Y}_K-Y\|_F^2$.
The squared Frobenius norm is just the sum of the squares of all the entries of a matrix. In particular, for the scaled matrix $Y$ we will always have $$\|Y\|_F^2=n$$ For any matrix, the Frobenius norm is also equal to the sum of the squares of the singular values. In particular, for the $K$-rank approximation $\hat{Y}_K$ we have $$\|\hat{Y}_K\|_F^2=\sum_{k=1}^K\sigma_k^2$$ So we can define an "order $K$ coefficient of determination" as $$R_K^2\equiv\frac{\|\hat{Y}_K\|_F^2}{\|Y\|_F^2}\in[0,1]$$
A simple option would be to just use the first singular value, i.e. $$R_1^2\equiv\frac{\sigma_1^2}{n}$$ The maximum similarity would be for $n$ parallel vectors (i.e. scalar multiples of a single vector), giving $R_1^2=1$. The minimum similarity would be for $n$ orthogonal vectors, giving $R_1^2=\frac{1}{n}$.
A few final notes:
svds(Y,1)
in Matlab.)CosSimN = @(Y) (svds(Y,1)^2-1)/(size(Y,2)-1)
, if we assume that $Y$ is already available as input.