Solved – How to combine two tfidf sparse vectors

scikit learntf-idf

Say that I have two document collections that I have created a tf-idf sparse vector for each one using TfidfVectorizer. How could I combine those two vectors into one that would resemble the tfidf of the union of the two collection?

How could I approach this since the two collection will probably have different features?

Best Answer

Why don't you calculate them from scratch?

An important part of the Vector Space Model is the dictionary. In your case you are having two collections and therefore two dictionaries that may have common elements or may not. In any case you need to merge the two dictionaries and then calculate TF-IDF weights for each of your documents.

Otherwise I don't see what semantics merging would have in a different dimensional space.