Just to make sure that we are on the same page, I take it from your description that you consider a supervised learning problem where you know the Good/Bad status of your objects and where you have a vector of features for each object that you want to use to classify the object as either Good or Bad. Moreover, the result of training an SVM is to give a classifier, which, on the holdout data, gives almost no false Bad predictions, but 55% false Good predictions. I have not personally worked with problems with such a huge difference in error rates on the two groups. It suggests to me that the distribution of features in the two groups overlap, but that the distribution of features in the Bad group is more spread out. Like two Gaussian distributions with almost the same mean but larger variance for the group of Bad objects. If that is the case, I would imagine that it will be difficult, if not impossible, to improve much on the error rate for the Good predictions. There may be other explanations that I am not aware of.
Having said that, I think it is a sensible strategy to combine classification procedures in a hierarchical way as you suggest. First, one classifier splits the full training set into two groups, and then other classifiers split each of the groups into two groups etc. In fact, that is what classification trees do, but typically using very simple splits in each step. I see no formal problem in training whatever model you like on the training data that is classified as being Good by the SVM. You don't need to use the holdout data. In fact, you shouldn't, if you need the holdout data for assessment of the model.
Your second suggestion is closely related to just using the group classified as Good from your training data to train a second model. I don't see any particular reason to use CV-based classifications to obtain this group. Just remember, that if you are going to use CV, then the entire training procedure must be carried out each time.
My suggestion is to first get a better understanding of what the feature distributions look like in the two groups from low-dimensional projections and exploratory visualizations. It might shed some light on why the error rate on the Good classifications is so large.
Answering the specific question.
Yes, it is possible (but might be undesirable)
Directed graph models (such as ANNs) can cope with that.
You have 5 input variables, and you want to predict gender first, and include this prediction to predict income.
Basically, you need to connect all your inputs to an output, which is gender, then connect all your inputs again to a second output, income (a skip connection, in ANN terminology), plus your first output to the second. Also, don't forget bias terms.
Arrows are weights (coefficients) and circles are nodes.
Then, you want to minimize binomial loss on the prediction of gender, and (perhaps) squared loss on the prediction of income, optimizing the weights in the model. You can write this as your loss function.
$$\mathbb L_\text{total} (\text{input, gender, income})=\mathbb L _\text{binomial}(\text{gender})+\mathbb L _\text{squared}(\text{income})$$
You might want to normalize the loss terms, because, due to variance, one might dominate over the other.
Notice that the prediction of income depends on all weights in the model, so optimizing for the prediction of gender separately first might not be optimal to the predicition of income.
Now,
While this is specifically what you asked for, you have to ask if it's really useful. The separate prediction of gender and subsequently income including the gender predictions might be simpler, and above all, easier to implement.
Also, on imputation,
While studying these mediating relationships might be of interest, I have to warn you about the possibility of imputation.
If your objective is to account for missing gender information in some observations, then consider that the model won't use the actual gender information of the observations containing it. If the independent variables do not predict gender reasonably, you'll be basically forfeiting this information in your model, probably tending to a no informative output in the gender node.
Best Answer
Suppose you have a dataset with credit card transactions, with binary labels indicating whether they were fraudulent or genuine. I suppose the user ID or card ID of a transaction could be viewed as its ''origin'' in the context of your question.
Now, suppose you have a single user / card ID which was used in multiple transactions. If it was a fraudster, it's likely they have multiple fraudulent transactions in the dataset (assuming they were not immediately caught and blocked). If you put some of these transactions in the training data, and some other transactions of the same user in the test data, those test cases may be too easy to detect for a trained model.
As an extreme case, suppose we include the card ID itself as a feature when training our model. It can simply memorize that that card ID was associated with fraudulent transactions in the training data, and have an unrealistically easy time detecting and flagging them in the test data. This model would start performing significantly less well in the future, when all those card IDs it has memorized have been blocked.
Of course, this is an extreme example, you probably wouldn't want to include the raw card ID as a feature. The issue can still appear in a less extreme manner in realistic cases though (for example, users may be memorized and recognized based on pattersn in their behaviour which are encoded in features).