This is by no means a complete answer, the question you should be asking is "what kind of distances are preserved when doing dimensionality reduction?". Since clustering algorithms such as K-means operate only on distances, the right distance metric to use (theoretically) is the distance metric which is preserved by the dimensionality reduction. This way, the dimensionality reduction step can be seen as a computational shortcut to cluster the data in a lower dimensional space. (also to avoid local minima, etc)
There are many subtleties here which I will not pretend to understand, (local distances vs global distances, how relative distances are distorted, etc) but I think this is the right direction to to think about these things theoretically.
No.
Cosine similarity can be computed amongst arbitrary vectors. It is a similarity measure (which can be converted to a distance measure, and then be used in any distance based classifier, such as nearest neighbor classification.)
$$\cos \varphi = \frac{a\cdot b}{\|a\| \, \|b\|} $$
Where $a$ and $b$ are whatever vectors you want to compare.
If you want to do NN classification, you would use $a$ as your new document, and $b$ as your known sample documents, then classify the new document based on the most similar sample(s).
Alternatively, you could compute a centroid for a whole class, but that would assume that the class is very consistent in itself, and that the centroid is a reasonable estimator for the cosine distances (I'm not sure about this!). NN classification is much easier for you, and less dependent on your corpus to be very consistent in itself.
Say you have the topic "sports". Some documents will talk about Soccer, others about Basketball, others about American Football. The centroid will probably be quite meaningless. Keeping a number of good sample documents for NN classification will likely work much better.
This happens commonly when one class consists of multiple clusters. It's an often misunderstood thing, classes do not necessarily equal clusters. Multiple classes may be one big cluster when they are hard to discern in the data. And on the other hand a class may well have multiple clusters if it is not very uniform.
Clustering can work well for finding good sample documents from your training data, but there are other more appropriate methods. In a supervised context, supervised methods will always perform better than unsupervised.
Best Answer
Cosine similarity is not a clustering technique. It's a common distance measure for sparse vectors all over the place, in information retrieval and classification maybe even more than in clustering.
I do not have the impression that you really have understood clustering. It is an unsupervised knowledge discovery technique. As it is unsupervised, you cannot "direct" it towards building a "sports" and a "non-sports" cluster. It might just as well find an "Obama" cluster and a "non-Obama" cluster.
If you are interested in Sports as opposed to non-Sports, you are doing classification. And yes, you may use cosine distance in classification!
A clustering algorithm may find clusters such as "Sentences containing the word Banana" (most likely it will not give you this explanation though!), and it hasn't failed. It's a mathematically valid cluster, and how is the algorithm supposed to know that you don't like Bananas?