Let $\mathcal{X}$ represent your input space i.e the space where your data points resides. Consider a function $\Phi:\mathcal{X} \rightarrow \mathcal{F}$ such that it takes a point from your input space $\mathcal{X}$ and maps it to a point in $\mathcal{F}$. Now, let us say that we have mapped all your data points from $\mathcal{X}$ to this new space $\mathcal{F}$. Now, if you try to solve the normal linear svm in this new space $\mathcal{F}$ instead of $\mathcal{X}$, you will notice that all the earlier working simply look the same, except that all the points $x_i$ are represented as $\Phi(x_i)$ and instead of using $x^Ty$ (dot product) which is the natural inner product for Euclidean space, we replace it with $\langle \Phi(x), \Phi(y) \rangle$ which represents the natural inner product in the new space $\mathcal{F}$. So, at the end, your $w^*$ would look like,
$$
w^*=\sum_{i \in SV} h_i y_i \Phi(x_i)
$$
and hence,
$$
\langle w^*, \Phi(x) \rangle = \sum_{i \in SV} h_i y_i \langle \Phi(x_i), \Phi(x) \rangle
$$
Similarly,
$$
b^*=\frac{1}{|SV|}\sum_{i \in SV}\left(y_i - \sum_{j=1}^N\left(h_j y_j \langle \Phi(x_j), \Phi(x_i)\rangle\right)\right)
$$
and your classification rule looks like: $c_x=\text{sign}(\langle w, \Phi(x) \rangle+b)$.
So far so good, there is nothing new, since we have simply applied the normal linear SVM to just a different space. However, the magic part is this -
Let us say that there exists a function $k:\mathcal{X}\times\mathcal{X}\rightarrow \mathbb{R}$ such that $k(x_i, x_j) = \langle \Phi(x_i), \Phi(x_j) \rangle$. Then, we can replace all the dot products above with $k(x_i, x_j)$. Such a $k$ is called a kernel function.
Therefore, your $w^*$ and $b^*$ look like,
$$
\langle w^*, \Phi(x) \rangle = \sum_{i \in SV} h_i y_i k(x_i, x)
$$
$$
b^*=\frac{1}{|SV|}\sum_{i \in SV}\left(y_i - \sum_{j=1}^N\left(h_j y_j k(x_j, x_i)\right)\right)
$$
For which kernel functions is the above substitution valid? Well, that's a slightly involved question and you might want to take up proper reading material to understand those implications. However, I will just add that the above holds true for RBF Kernel.
To answer your question, "Is the situation so that all the support vectors are needed for the classification?"
Yes. As you may notice above, we compute the inner product of $w$ with $x$ instead of computing $w$ explicitly. This requires us to retain all the support vectors for classification.
Note: The $h_i$'s in the final section here are solution to dual of the SVM in the space $\mathcal{F}$ and not $\mathcal{X}$. Does that mean that we need to know $\Phi$ function explicitly? Luckily, no. If you look at the dual objective, it consists only of inner product and since we have $k$ that allows us to compute the inner product directly, we don't need to know $\Phi$ explicitly. The dual objective simply looks like,
$$
\max \sum_i h_i - \sum_{i,j} y_i y_j h_i h_j k(x_i, x_j) \\
\text{subject to : } \sum_i y_i h_i = 0, h_i \geq 0
$$
Check out A practical guide to SVM Classification for some pointers, particularly page 5.
We recommend a "grid-search" on $C$ and $\gamma$ using cross-validation. Various pairs of $(C,\gamma)$ values are tried and the one with the best cross-validation accuracy is
picked. We found that trying exponentially growing sequences of $C$ and $\gamma$ is a
practical method to identify good parameters (for example, $C = 2^{-5},2^{-3},\ldots,2^{15};\gamma = 2^{-15},2^{-13},\ldots,2^{3}$).
Remember to normalize your data first and if you can, gather more data because from the looks of it, your problem might be heavily underdetermined.
Best Answer
The complexity of SVM regression is similar to the complexity of SVM classification. If problems of that size are feasible for you in a classification context, they are also feasible in regression.
When using a nonlinear kernel, training complexity is quadratic in terms of the number of training instances. 100k training instances is quite a lot, so I most definitely recommend trying a linear kernel first. For the linear kernel, you should consider using LIBLINEAR instead of LIBSVM (same authors, the former is made specifically for large-scale problems).
The impact of the number of dimensions on training time is not very high, this is one of the advantages of kernel methods. Knowing that, you may well go for 4000 dimensions straight away. If you have 4000 dimensions, linear models are likely to perform quite well.
It is very hard to give a good estimate of the actual run time as it depends on a lot of things, related to the data and your hardware. That said, you can expect training time to be in the order of
hourstens of minutes per model for LIBSVM. If you use LIBLINEAR, it will be a couple of seconds.