is there a good way to use the a priori knowledge that all non-c objects likely form two distinct clusters
If you are using a tree based method I don't think it matters as these classifiers partition the feature space then look at the proportion of samples in each class. So all that matters is the relative occurrence of class c in each terminal node.
If however you were using something like a mixture of normals, LDA, etc then combining two clusters would be a bad idea (assuming classes a and b form unique clusters). Here you need to preserve the class structure to accurately describe the feature space that maps to a,b and c. These models assume the features for each class have a different Normal distribution. If you combine a and b you will force a single Normal distribution to be fit to a mixture.
In summary for trees it shouldn't matter much if you:
I. Create three classifiers (1. a vs b, 2. a vs c and 3. b vs c) then predict with a voting based method.
II. Merge classes a and b to form a two-class problem.
III. Predict all three classes then map the prediction to a two class value (e.g. f(c) = c, f(a) = not c, f(b) = not c).
However if you use a method that is fitting a distribution to each class then avoid II. and test which of I. or III. works better for your problem
The problem
The RBF kernel function for two vectors $\mathbf{u}$ and $\mathbf{v}$ looks like this:
$$
\kappa(\mathbf{u},\mathbf{v}) = \exp(-\gamma \|\mathbf{u}-\mathbf{v}\|^2).
$$
Essentially, your results indicate that your values for $\gamma$ are way too high. When that happens, the kernel matrix essentially becomes the unit matrix, because $\|\mathbf{u}-\mathbf{v}\|^2$ is larger than 0 if $\mathbf{u}\neq \mathbf{v}$ and 0 otherwise which leads to kernel values of $\approx 0$ and 1 respectively when $\gamma$ is very large (consider the limit $\gamma=\infty$).
This then leads to an SVM model in which all training instances are support vectors, and this model fits the training data perfectly. Of course, when you predict a test set, all predictions will be identical to the model's bias $\rho$ because the kernel computations are all zero, i.e.:
$$
f(\mathbf{z}) = \underbrace{\sum_{i\in SV} \alpha_i y_i \kappa(\mathbf{x}_i, \mathbf{z})}_{always\ 0} + \rho,
$$
where $\mathbf{x}_i$ is the ith support vector and $\alpha_i$ is its corresponding dual weight.
The solution
Your search space needs to be expanded to far lower values of $\gamma$. Typically we use exponential grids, e.g. $10^{lb} \leq \gamma \leq 10^{ub}$, where the bounds are data dependent (e.g. $[-8, 2]$).
I suspect you're using grid search at the moment, which is a very poor way to optimize hyperparameters because it wastes most of the time investigating hyperparameters that aren't good for your problem.
It's far better to use optimizers that are designed for such problems, which are available in libraries like Optunity and Hyperopt. I'm the main developer of Optunity, you can find an example that does exactly what you need (i.e., tune a sklearn SVC) in our documentation.
Best Answer
First of all, if your classifier doesn't do better than a random choice, there is a risk that there simply is no connection between features and class. A good question to ask yourself in such a position, is weather you or a domain expert could infer the class (with an accuracy greater than a random classifier) based on given features. If no, then getting more data rows or changing the classifier won't help. What you need to do is get more data using different features.
IF on the other hand you think the information needed to infer the class is already in the labels, you should check whether your classifier suffers from a high bias or high variance problem.
To do this, graph the validation error and training set error, as a function of training examples.
If the lines seem to converge to the same value and are close at the end, then your classifier has high bias and adding more data won't help. A good idea in this case is to either change the classifier for a one that has higher variance, or simply lower the regularization parameter of your current one.
If on the other hand the lines are quite far apart, and you have a low training set error but high validation error, then your classifier has too high variance. In this case getting more data is very likely to help. If after getting more data the variance will still be too high, you can increase the regularization parameter.
This are the general rules I would use when faced with a problem like yours.
Cheers.