You may want to have a look at this Wikipedia article. It has a nice overview of the most popular algorithms, although some more recent developments, like tSNE, are missing.
In feature extraction the fundamental idea is looking for an alternative representation where the underlying structure of the data is more apparent. This is done by minimizing some error or energy functional which yields that mapping.
Some approaches like PCA, CCA, local linear embeddings (LLE), local linear projections (LLP), and some others are attractive because you end up solving a linear problem, for which there are efficient numerical methods. The nice thing about many of them (like LLE) is that they are able to consider non-linear mappings, but still you solve a linear system.
The idea is that you introduce a matrix whose elements encode relationships between samples (some distance). You then find the projections of the original data according to that matrix, into a lower dimensional space in such a way that the distortion is minimal. In this lower dimensional (usually two, so you can visualize it in your screen) the different patterns in your data are more evident. Usually, to describe those relationships between samples, only the k-closests samples to each data point are considered (which is a non-linear relationship).
Still, non-linear cases like tSNE and others, can also be solved efficiently by means of some gradient based optimization algorithm.
Which method might be better, depends on your data, and your amount of data (computational costs). I am not aware of any objective criteria which might let you decide beforehand which one to use, but just try them out. (Unless you know your data follows some trivial lineal pattern). For tSNE, LLE, and other methods are a numnber of implementations, often in Matlab, but also in other languages and packages.
Best Answer
The following academic paper comapres several methods for feature selection. It's old but it remains relevant today. As a bonus, it also used Reuters articles.