No.
Cosine similarity can be computed amongst arbitrary vectors. It is a similarity measure (which can be converted to a distance measure, and then be used in any distance based classifier, such as nearest neighbor classification.)
$$\cos \varphi = \frac{a\cdot b}{\|a\| \, \|b\|} $$
Where $a$ and $b$ are whatever vectors you want to compare.
If you want to do NN classification, you would use $a$ as your new document, and $b$ as your known sample documents, then classify the new document based on the most similar sample(s).
Alternatively, you could compute a centroid for a whole class, but that would assume that the class is very consistent in itself, and that the centroid is a reasonable estimator for the cosine distances (I'm not sure about this!). NN classification is much easier for you, and less dependent on your corpus to be very consistent in itself.
Say you have the topic "sports". Some documents will talk about Soccer, others about Basketball, others about American Football. The centroid will probably be quite meaningless. Keeping a number of good sample documents for NN classification will likely work much better.
This happens commonly when one class consists of multiple clusters. It's an often misunderstood thing, classes do not necessarily equal clusters. Multiple classes may be one big cluster when they are hard to discern in the data. And on the other hand a class may well have multiple clusters if it is not very uniform.
Clustering can work well for finding good sample documents from your training data, but there are other more appropriate methods. In a supervised context, supervised methods will always perform better than unsupervised.
Xeon is right in what TF-IDF and cosine similarity are two different things. TF-IDF will give you a representation for a given term in a document. Cosine similarity will give you a score for two different documents that share the same representation. However, "one of the simplest ranking functions is computed by summing the tf–idf for each query term". This solution is biased towards long documents where more of your terms will appear (e.g., Encyclopedia Britannica). Also, there are much more advance approaches based on a similar idea (most notably Okapi BM25).
In general, you should use the cosine similarity if you are comparing elements with the same nature (e.g., documents vs documents) or when you need the score itself to have some meaningful value. In the case of cosine similarity, a 1.0 means that the two elements are exactly the same based on their representation. I would recommend these resources to know more about the topic:
Modern Information Retrieval, by Ricardo Baeza-Yates et al.,
Introduction to Information Retrieval, by Christopher Manning et al.
Best Answer
Usually (in my experience) it does make sense to exclude some of the terms.
These terms are usually very frequent functional words (like "a", "the", "will") or very infrequent ones, and they are typically do not have any discriminative power - that is, they are not helpful when deciding if a document should belong to a cluster or not.
I usually use a list of stopwords for excluding too frequent words and count-based filtering for very infrequent words.
If you use sklearn, you can include the filtering to your vectorizer:
This will exclude all terms that do not appear in at least 5 documents.
Then you can improve it further. For example, some of discarded infrequent words may be typos, so sometimes it may make sense to do spelling correction before vectorizing documents.