Solved – How to deal with TFIDF for test classification

classificationtext miningtf-idf

I am working on a project where I have to classify texts.
For that, I am dividing my data into training and test data.

In order to train my classifier(to be determined later), I am planning to calculate the tfidf matrix for the documents.
However, I have a question concerning that.

TFIDF is highly related to the documents for which it was calculated.
Hence, does it make sence to recalculate it for the training data and test the classifier on it?

If yes, what is the logic behind that.

Please provide references if possible .

Best Answer

After several searches , the best way to do that is to calculate the tfidf for the training data.Then to validate your model, compute the tfidf for the test data using the vocabulary from training data.

Related Question