Zen used method 1. Here is method 2: Map $x$ to a spherically symmetric Gaussian distribution centered at $x$ in the Hilbert space $L^2$. The standard deviation and a constant factor have to be tweaked for this to work exactly. For example, in one dimension,
$$ \int_{-\infty}^\infty \frac{\exp[-(x-z)^2/(2\sigma^2)]}{\sqrt{2 \pi} \sigma} \frac{\exp[-(y-z)^2/(2 \sigma^2)}{\sqrt{2 \pi} \sigma} dz = \frac{\exp [-(x-y)^2/(4 \sigma^2)]}{2 \sqrt \pi \sigma}. $$
So, use a standard deviation of $\sigma/\sqrt 2$ and scale the Gaussian distribution to get $k(x,y) = \langle \Phi(x), \Phi(y)\rangle$. This last rescaling occurs because the $L^2$ norm of a normal distribution is not $1$ in general.
Let $\mathcal{X}$ represent your input space i.e the space where your data points resides. Consider a function $\Phi:\mathcal{X} \rightarrow \mathcal{F}$ such that it takes a point from your input space $\mathcal{X}$ and maps it to a point in $\mathcal{F}$. Now, let us say that we have mapped all your data points from $\mathcal{X}$ to this new space $\mathcal{F}$. Now, if you try to solve the normal linear svm in this new space $\mathcal{F}$ instead of $\mathcal{X}$, you will notice that all the earlier working simply look the same, except that all the points $x_i$ are represented as $\Phi(x_i)$ and instead of using $x^Ty$ (dot product) which is the natural inner product for Euclidean space, we replace it with $\langle \Phi(x), \Phi(y) \rangle$ which represents the natural inner product in the new space $\mathcal{F}$. So, at the end, your $w^*$ would look like,
$$
w^*=\sum_{i \in SV} h_i y_i \Phi(x_i)
$$
and hence,
$$
\langle w^*, \Phi(x) \rangle = \sum_{i \in SV} h_i y_i \langle \Phi(x_i), \Phi(x) \rangle
$$
Similarly,
$$
b^*=\frac{1}{|SV|}\sum_{i \in SV}\left(y_i - \sum_{j=1}^N\left(h_j y_j \langle \Phi(x_j), \Phi(x_i)\rangle\right)\right)
$$
and your classification rule looks like: $c_x=\text{sign}(\langle w, \Phi(x) \rangle+b)$.
So far so good, there is nothing new, since we have simply applied the normal linear SVM to just a different space. However, the magic part is this -
Let us say that there exists a function $k:\mathcal{X}\times\mathcal{X}\rightarrow \mathbb{R}$ such that $k(x_i, x_j) = \langle \Phi(x_i), \Phi(x_j) \rangle$. Then, we can replace all the dot products above with $k(x_i, x_j)$. Such a $k$ is called a kernel function.
Therefore, your $w^*$ and $b^*$ look like,
$$
\langle w^*, \Phi(x) \rangle = \sum_{i \in SV} h_i y_i k(x_i, x)
$$
$$
b^*=\frac{1}{|SV|}\sum_{i \in SV}\left(y_i - \sum_{j=1}^N\left(h_j y_j k(x_j, x_i)\right)\right)
$$
For which kernel functions is the above substitution valid? Well, that's a slightly involved question and you might want to take up proper reading material to understand those implications. However, I will just add that the above holds true for RBF Kernel.
To answer your question, "Is the situation so that all the support vectors are needed for the classification?"
Yes. As you may notice above, we compute the inner product of $w$ with $x$ instead of computing $w$ explicitly. This requires us to retain all the support vectors for classification.
Note: The $h_i$'s in the final section here are solution to dual of the SVM in the space $\mathcal{F}$ and not $\mathcal{X}$. Does that mean that we need to know $\Phi$ function explicitly? Luckily, no. If you look at the dual objective, it consists only of inner product and since we have $k$ that allows us to compute the inner product directly, we don't need to know $\Phi$ explicitly. The dual objective simply looks like,
$$
\max \sum_i h_i - \sum_{i,j} y_i y_j h_i h_j k(x_i, x_j) \\
\text{subject to : } \sum_i y_i h_i = 0, h_i \geq 0
$$
Best Answer
You are missing one thing, namely the fact that we do not need to know the images of data instances in feature space $\phi(\mathbf{x}_i)$. For some kernel functions, the feature space is very complex/unknown (for instance some graph kernels), or infinite dimensional (for example the RBF kernel).
Kernel methods only need to be able to compute inner products between two images in feature space, e.g. $\kappa(\mathbf{x}_i,\mathbf{x}_j)=\langle\phi(\mathbf{x}_i),\phi(\mathbf{x}_j)\rangle$. We don't have to know the feature space to be able to compute inner products in it. This is called the kernel trick.
For an SVM, specifically, $\mathbf{w}$ is the separating hyperplane in feature space. You cannot always write this down in input space. Again, for the RBF kernel $\mathbf{w}$ resides in an infinite dimensional feature space. All we need to be able to do is compute the inner product of $\mathbf{w}$ and the image of the test instance $\mathbf{z}$ in feature space $\phi(\mathbf{z}$), which is:
$$\langle\mathbf{w},\phi(\mathbf{z})\rangle = \sum_{i\in SV}\alpha_i y_i \kappa(\mathbf{x}_i,\mathbf{z}).$$
SVMs exploit the so-called representer theorem, which states that the resulting models can always be expressed as a weighted sum of kernel evaluations between some training instances (the support vectors) and the test instance. This is in fact exploited by all kernel methods.
The RBF kernel maps onto an infinite dimensional feature space. For a writeup on this you may consult these slides by Chih-Jen Lin, particularly slides 10 and 11. For a one-dimensional $x$:
$$\phi_{RBF}(x) = e^{-\gamma x^2}\big[1,\sqrt{\frac{2\gamma}{1!}}x, \sqrt{\frac{(2\gamma)^2}{2!}}x^2, \sqrt{\frac{(2\gamma)^3}{3!}}x^3,\ldots\big]^T,$$
which follows from the Taylor expansion of the exponential function.