Cosine similarity is not a clustering technique. It's a common distance measure for sparse vectors all over the place, in information retrieval and classification maybe even more than in clustering.
I do not have the impression that you really have understood clustering. It is an unsupervised knowledge discovery technique. As it is unsupervised, you cannot "direct" it towards building a "sports" and a "non-sports" cluster. It might just as well find an "Obama" cluster and a "non-Obama" cluster.
If you are interested in Sports as opposed to non-Sports, you are doing classification. And yes, you may use cosine distance in classification!
- Classification is when you want to assign instances the appropriate class of your known types.
- Clustering is when you have no clue of what types there are, and you want an algorithm to discover what (if any!) types there might be. This may involve a lot of trial and error, as the algorithms may find clusters that are not interesting to you.
A clustering algorithm may find clusters such as "Sentences containing the word Banana" (most likely it will not give you this explanation though!), and it hasn't failed. It's a mathematically valid cluster, and how is the algorithm supposed to know that you don't like Bananas?
No.
Cosine similarity can be computed amongst arbitrary vectors. It is a similarity measure (which can be converted to a distance measure, and then be used in any distance based classifier, such as nearest neighbor classification.)
$$\cos \varphi = \frac{a\cdot b}{\|a\| \, \|b\|} $$
Where $a$ and $b$ are whatever vectors you want to compare.
If you want to do NN classification, you would use $a$ as your new document, and $b$ as your known sample documents, then classify the new document based on the most similar sample(s).
Alternatively, you could compute a centroid for a whole class, but that would assume that the class is very consistent in itself, and that the centroid is a reasonable estimator for the cosine distances (I'm not sure about this!). NN classification is much easier for you, and less dependent on your corpus to be very consistent in itself.
Say you have the topic "sports". Some documents will talk about Soccer, others about Basketball, others about American Football. The centroid will probably be quite meaningless. Keeping a number of good sample documents for NN classification will likely work much better.
This happens commonly when one class consists of multiple clusters. It's an often misunderstood thing, classes do not necessarily equal clusters. Multiple classes may be one big cluster when they are hard to discern in the data. And on the other hand a class may well have multiple clusters if it is not very uniform.
Clustering can work well for finding good sample documents from your training data, but there are other more appropriate methods. In a supervised context, supervised methods will always perform better than unsupervised.
Best Answer
Spherical k-means is the classical example. But really, any clustering algorithm that can take an arbitrary distance measure should be applicable, including DBSCAN; E.g., SciKit-Learn's implementation lets you choose the distance metric for DBSCAN, where you can plug in cosine similarity [1].
[1] http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html