Reproducing the content of the answer on Quora, in case you don't have a Quora account.
Question: Why does the RBF (radial basis function) kernel map into infinite dimensional space?
Answer : Consider the polynomial kernel of degree 2 defined by, $$k(x, y) =
(x^Ty)^2$$ where $x, y \in \mathbb{R}^2$ and $x = (x_1, x_2), y = (y_1, y_2)$.
Thereby, the kernel function can be written as, $$k(x, y) =
(x_1y_1 + x_2y_2)^2 = x_{1}^2y_{1}^2 + 2x_1x_2y_1y_2 +
x_{2}^2y_{2}^2$$ Now, let us try to come up with a feature map
$\Phi$ such that the kernel function can be written as
$k(x, y) = \Phi(x)^T\Phi(y)$.
Consider the following feature map, $$\Phi(x) = (x_1^2,
\sqrt{2}x_1x_2, x_2^2)$$ Basically, this feature map is mapping
the points in $\mathbb{R}^2$ to points in
$\mathbb{R}^3$. Also, notice that, $$\Phi(x)^T\Phi(y) =
x_1^2y_1^2 + 2x_1x_2y_1y_2 + x_2^2y_2^2$$ which is essentially
our kernel function.
This means that our kernel function is actually computing the
inner/dot product of points in $\mathbb{R}^3$. That is, it
is implicitly mapping our points from $\mathbb{R}^2$ to
$\mathbb{R}^3$.
Exercise Question : If your points are in $\mathbb{R}^n$, a
polynomial kernel of degree 2 will map implicitly map it to some
vector space F. What is the dimension of this vector space F? Hint:
Everything I did above is a clue.
Now, coming to RBF.
Let us consider the RBF kernel again for points in
$\mathbb{R}^2$. Then, the kernel can be written as
$$k(x, y) = \exp(-\|x - y\|^2) = \exp(- (x_1 - y_1)^2 - (x_2 - y_2)^2)$$ $$= \exp(- x_1^2 + 2x_1y_1 - y_1^2 - x_2^2 + 2x_2y_2 - y_2^2) $$ $$ = \exp(-\|x\|^2) \exp(-\|y\|^2) \exp(2x^Ty)$$ (assuming gamma = 1). Using the
taylor series you can write this as, $$k(x, y) = \exp(-\|x\|^2) \exp(-\|y\|^2) \sum_{n = 0}^{\infty} \frac{(2x^Ty)^n}{n!}$$ Now,
if we were to come up with a feature map $\Phi$ just like
we did for the polynomial kernel, you would realize that the feature
map would map every point in our $\mathbb{R}^2$ to an
infinite vector. Thus, RBF implicitly maps every point to an infinite
dimensional space.
Exercise Question : Get the first few vector elements of the feature
map for RBF for the above case?
Best Answer
Haasdonk and Bahlmann, Learning with distance substitution kernels (2004; doi, pdf) give us the following result, as a subset of their Proposition 1:
(Incidentally, this is what I've usually seen called the generalized RBF kernel, with the squared distance. Hat-tip to this related answer for the pointer.)
Thus if $\sqrt d$ is a pseudosemimetric, the kernel $\exp\left( - \gamma d(x, y) \right)$ is positive definite for all $\gamma > 0$ iff $\sqrt d$ is isometrically embeddable in $L_2$. (Of course, if $d$ is a pseudosemimetric, so is $\sqrt d$.)
Two facts about $L_2$ embeddability:
A metric $d$ is $L_2$-embeddable iff $-d^2$ is conditionally positive definite. (Schoenberg, Metric spaces and positive definite functions, 1938, Trans. Am. Math. Soc. 44.3 pp 522–536; pdf)
$-d^\beta$ is conditionally positive definite for all $\beta \in [0, 2]$ iff $d$ is $L_2$-embeddable. (Haasdonk and Bahlmann, Proposition 1)
So:
If $d$ is not $L_2$-embeddable, however, fact 2 does not imply that $-d$ is not cpd – only that there is some $\beta$ such that $-d^\beta$ is not cpd. I don't know a nice condition on $d$ that shows $\sqrt d$ is not $L_2$-embeddable.