Basically they generalize in the same way. The kernel based approach to regression is to transform the feature, call it $\mathbf{x}$ to some vector space, then perform a linear regression in that vector space. To avoid the 'curse of dimensionality', the linear regression in the transformed space is somewhat different than ordinary least squares. The upshot is that the regression in the transformed space can be expressed as $\ell(\mathbf{x}) = \sum_i w_i \phi(\mathbf{x_i}) \cdot \phi(\mathbf{x})$, where $\mathbf{x_i}$ are observations from the training set, $\phi(\cdot)$ is the transform applied to data, and the dot is the dot product. Thus the linear regression is 'supported' by a few (preferrably a very small number of) training vectors.
All the mathematical details are hidden in the weird regression done in the transformed space ('epsilon-insensitive tube' or whatever) and the choice of transform, $\phi$. For a practitioner, there are also questions of a few free parameters (usually in the definition of $\phi$ and the regression), as well as featurization, which is where domain knowledge is usually helpful.
There are two common ways of utilizing the maximum-margin hyperplane of a trained SVM.
(1) Prediction for new data points
Based on a given training dataset, an SVM hyperplane is fully specified by its slope $w$ and an intercept $b$. (These variable names derive from a tradition established in the neural-networks literature, where the two respective quantities are referred to as 'weight' and 'bias.') As noted before, a new data point $x \in \mathbb{R}^d$ can then be classified as
\begin{align}
f(x) = \textrm{sgn}\left(\langle w,x \rangle + b \right)
\end{align}
where $\langle w,x \rangle$ represents the inner product. Thanks to the Karush-Kuhn-Tucker complementarity conditions, the discriminant function can be rewritten as
\begin{align}
f(x) = \sum_{i \in SV} \alpha_i \langle x_i, x \rangle + b,
\end{align}
where the hyperplane is implicitly encoded by the support vectors $x_i$, and where $\alpha_i$ are the support vector coefficients. The support vectors are those training data points which are closest to the separating hyperplane. Thus, predictions can be made very efficiently, since only inner products (alternatively: kernel functions) between some training points and the test point have to be evaluated.
Some have suggested also considering the distance between a new data point $x$ and the hyperplane, as an indicator of how confident the model was in its prediction. However, it is important to note that hyperplane distance itself does not afford inference; there is no probability associated with a new prediction, which is why an SVM is sometimes referred to as a point classifier. If probabilistic output is desired, other classifiers may be more appropriate, e.g., the SVM's probabilistic cousin, the relevance vector machine (RVM).
(2) Reconstructing feature weights
There is another way of putting an SVM model to use. In many classification analyses it is interesting to examine which features drove the classifier, i.e., which features played the biggest role in shaping the separating hyperplane. Given a trained SVM model with a linear kernel, these feature coefficients $w_1, \ldots, w_d$ can be reconstructed easily using
\begin{align}
w = \sum_{i=1}^n y_i \alpha_i x_i
\end{align}
where $x_i$ and $y_i$ represent the $i^\textrm{th}$ training example and its corresponding class label.
An important caveat of this approach is that the resulting feature weights are simple numerical coefficients without inferential quality; there is no measure of confidence associated with them. Thus, we cannot readily argue that some features were 'more important' than others, and we cannot infer that a feature with a particularly low coefficient was 'not important' in the classification problem. In order to allow for inference on feature weights, we would need to resort to more general-purpose approaches, such as the bootstrap, a permutation test, or a feature-selection algorithm embedded in a cross-validation scheme.
Best Answer
Yes! But maybe not in the way you mean. In my research I frequently create categorical features from continuously-valued ones using an algorithm like recursive partitioning. I usually use this approach with the SVMLight implementation of support vector machines, but I've used it with LibSVM as well. You'll need to be sure you assign your partitioned categorical features to a specific place in your feature vector during training and classification, otherwise your model is going to end up jumbly.
Edit: That is to say, when I've done this, I assign the first n elements of the vector to the binary values associated with the output of recursive partitioning. In binary feature modeling, you just have a giant vector of 0's and 1's, so everything looks the same to the model, unless you explicitly indicate where different features are. This is probably overly specific, as I imagine most SVM implementations will do this on their own, but, if you like to program your own, it might be something to think about!