Solved – What does word embedding weighted by tf-idf mean

machine learningnatural languagetf-idfword embeddingsword2vec

The paper that I am reading explains about how it implemented the feature vector used for a twitter sentiment classification task.

The first is a simple combination, where each tweet is represented by
the average of the word embedding vectors of the words that compose
the tweet. The second approach also averages the word embedding
vectors, but each embedding vector is now weighted (multiplied) by the
tf-idf of the word it represents.

I understand the first part which is just basically adding all word vectors of a tweet, but I am not quite sure how to get the second one which is a word vector multiplied by the tf-idf.

To get this vector, do I just simply have to multiply the tf-idf vectorizer by the average of word embeddings? What kind of multiplication is it? I am also not sure if the multiplication will work since the shape won't match.

Best Answer

This quote is clearly talking about sentence embeddings, obtained from word embeddings.

If the sentence $s$ consists of words $(w_1, ..., w_n)$, we'd like to define an embedding vector $Emb_s(s) \in \mathbb{R}^d$ for some $d > 0$.

The authors of this paper propose to compute it from the embeddings of words $w_i$, let's call them $Emb_w(w_i)$, so that $Emb_s(s)$ is a linear combination of $Emb_w(w_i)$ and has the same dimensionality $d$:

$$Emb_w(s) = \sum_{w_i \in s} c_i \cdot Emb_w(w_i)$$

.... where $c_i \in \mathbb{R}$ are the coefficients (scalars). Note that $d$ is the same for all word vectors.

In the simplest case, all $c_i = 1$, so $Emb_s(s)$ would be a sum of constituent vectors. A better approach is to do averaging, i.e., $c_i = \frac{1}{n}$ (to handle sentences of different length). Note that the dimensionality doesn't change, it's still $d$.

Finally, the proposed method is the weighted average, where the weights are TF-IDF. This allows to capture that some words in a sentence are naturally more valuable than others. Once again, there's no problem with dimensions, because it's a sum of $\mathbb{R}^d$ vectors, multiplied by scalars.