Solved – Text feature vector extraction

algorithmsdata miningfeature selectionk nearest neighbour

I have a class assignment to implement a couple existing ways to extract feature vectors from a given set of texts, so they can be used to classify those texts using k-nearest neighbour algorithm. The texts are newspaper articles from Reuters, and they are to be sorted based on the country related to the story, and its subject (economy, politics, etc.).

What existing text feature extraction algorithms would work well for this task?

Best Answer

The following academic paper comapres several methods for feature selection. It's old but it remains relevant today. As a bonus, it also used Reuters articles.

Lewis, D. D. (1992). Feature selection and feature extraction for text categorization Proceedings of Speech and Natural Language Workshop. San Francsico Morgan Kaufmann. (PDF)

Related Question