Let $h(x)$ be the projection to high dimension space $\mathcal{F}$. Basically the kernel function $K(x_1,x_2)=\langle h(x_1),h(x_2)\rangle$, which is the inner-product. So it's not used to project data points, but rather an outcome of the projection. It can be considered a measure of similarity, but in an SVM, it's more than that.
The optimization for finding the best separating hyperplane in $\mathcal{F}$ involves $h(x)$ only through the inner-product form. That's to say, if you know $K(\cdot,\cdot)$, you don't need to know the exact form of $h(x)$, which makes the optimization easier.
Each kernel $K(\cdot,\cdot)$ has a corresponding $h(x)$ as well. So if you're using an SVM with that kernel, then you're implicitly finding the linear decision line in the space that $h(x)$ maps into.
Chapter 12 of Elements of Statistical Learning gives a brief introduction to SVM . This gives more detail about the connection between kernel and feature mapping:
http://statweb.stanford.edu/~tibs/ElemStatLearn/
RUser4512 gave the correct answer: RBF kernel works well in practice and it is relatively easy to tune. It's the SVM equivalent to "no one's ever been fired for estimating an OLS regression:" it's accepted as a reasonable default method. Clearly OLS isn't perfect in every (or even many) scenarios, but it's a well-studied method, and widely understood. Likewise, the RBF kernel is well-studied and widely understood, and many SVM packages include it as a default method.
But the RBF kernel has a number of other properties. In these types of questions, when someone is asking about "why do we do things this way", I think it's important to also draw contrasts to other methods to develop context.
It is a stationary kernel, which means that it is invariant to translation. Suppose you are computing $K(x,y).$ A stationary kernel will yield the same value $K(x,y)$ for $K(x+c,y+c)$, where $c$ may be vector-valued of dimension to match the inputs. For the RBF, this is accomplished by working on the difference of the two vectors. For contrast, note that the linear kernel does not have the stationarity property.
The single-parameter version of the RBF kernel has the property that it is isotropic, i.e. the scaling by $\gamma$ occurs the same amount in all directions. This can be easily generalized, though, by slightly tweaking the RBF kernel to $K(x,y)=\exp\left(-(x-y)'\Gamma(x-y)\right)$ where $\Gamma$ is a p.s.d. matrix.
Another property of the RBF kernel is that it is infinitely smooth. This is aesthetically pleasing, and somewhat satisfying visually, but perhaps it is not the most important property. Compare the RBF kernel to the Matern kernel and you'll see that there some kernels are quite a bit more jagged!
The moral of the story is that kernel-based methods are very rich, and with a little bit of work, it's very practical to develop a kernel suited to your particular needs. But if one is using an RBF kernel as a default, you'll have a reasonable benchmark for comparison.
Best Answer
This answer also adds points from pAt84's answer
Dimensionality reduction. I don't think reducing dimension can make your data more separable actually but it has some other benefits. First, as the dimension of your space grows, you often need more samples to be able to catch patterns, this is just a question of volume. Of course this all depends on how corrolated your data is (if you add a duplicate column it won't change). So reducing your dimension can be a necessity when you have feature vectors that has too many components, e.g. if you work with text and have a sort of TF-IDF, it would create one dimension per word (or lemma). Moreover, it prevents you from the curse of dimensionnality (e.g. if you need to perform some sort of KNN afterwards). And obviously it also fastens a lot the algorithm you run in the reduced space - though the cost of reducing dimension can be higher than the gain of running algorithms in reduced dimension. One thing is sure: dimensionnality reduction decreases information. Most of the time it does so by discarding correlations in the input data.
Kernel trick. That being said, the kernel trick is an entirely different idea. Some data that are not separable in the original space can become in a higher one: imagine two circles of radius $.5$ and $1$ sampled, those points are not separable, but if you add the distance to $0$ they become. So the idea is that the way we use a dot-product in the original space is not sufficient enough, we ought to use the dot-product in that mapped space with distance to the origine, and that's the core idea to kernels: find functions that can "scatter" our data as we want into a hilbert space. But the way I prefer to see it is that using a kernel is using a prior information on your data: how they can be compared. Roughly, a kernel is more or less a similarity matrix, and a kernel-based method is a method where you've switch the naive way of comparison (regular dot-product in the original space) to one that fits your data best (your kernel). Then, indeed, it can correspond in a mathematical way, to increasing the dimension of the space comparison happens in. Precisely: your data still lives on the same-dimensional space, but the kernel you provided acts as if you mapped your data in a higher dimensional space (RKHS), that can have an infinite dimension (e.g. with an RBF), and you used a dot-product in that space. The main trick of kernel-based method is that you actualy never explicitly map your data into the RKHS - that's why it can work! All the work is done by applying your kernel on two samples. Eventually, you haven't gained any information, meaning that it helps give a mathematical frame to explain how it works, but intuitively the idea is just that you've provided a better way to compare your data: no dimension were really explicitly added to your data (computationaly), you simply found a more suitable way to compare them.
Now nothing prevents you from doing both: you could try to categorize texts by doing a TF-IDF, then a PCA and then an RBF-SVM (though a linear one is most of the time OK with texts but nonetheless that would make perfect sense).