I'm wondering if there's any algorithm could do classification and regression at the same time. For example, I'd like to let the algorithm learn a classifier, and at the same time within each label, it also learns a continuous target. Thus, for each training example, it has a categorical label and a continuous value.
I could train a classifier first, and then train a regressor within each label, but I'm just thinking that if there's an algorithm that could do both, it would be wonderful.
Best Answer
The problem that you are describing can be solved by latent class regression, or cluster-wise regression, or it's extension mixture of generalized linear models that are all members of a wider family of finite mixture models, or latent class models.
It's not a combination of classification (supervised learning) and regression per se, but rather of clustering (unsupervised learning) and regression. The basic approach can be extended so that you predict the class membership using concomitant variables, what makes it even closer to what you are looking for. In fact, using latent class models for classification was described by Vermunt and Magidson (2003) who recommend it for such pourpose.
Latent class regression
This approach is basically a finite mixture model (or latent class analysis) in form
$$ f(y \mid x, \psi) = \sum^K_{k=1} \pi_k \, f_k(y \mid x, \vartheta_k) $$
where $\psi = (\boldsymbol{\pi}, \boldsymbol{\vartheta})$ is a vector of all parameters and $f_k$ are mixture components parametrized by $\vartheta_k$, and each component appears with latent proportions $\pi_k$. So the idea is that the distribution of your data is a mixture of $K$ components, each that can be described by a regression model $f_k$ appearing with probability $\pi_k$. Finite mixture models are very flexible in the choice of $f_k$ components and can be extended to other forms and mixtures of different classes of models (e.g. mixtures of factor analyzers).
Predicting probability of class memberships based on concomitant variables
The simple latent class regression model can be extended to include concomitant variables that predict the class memberships (Dayton and Macready, 1998; see also: Linzer and Lewis, 2011; Grun and Leisch, 2008; McCutcheon, 1987; Hagenaars and McCutcheon, 2009), in such case the model becomes
$$ f(y \mid x, w, \psi) = \sum^K_{k=1} \pi_k(w, \alpha) \, f_k(y \mid x, \vartheta_k) $$
where again $\psi$ is a vector of all parameters, but we include also concomitant variables $w$ and a function $\pi_k(w, \alpha)$ (e.g. logistic) that is used to predict the latent proportions based on the concomitant variables. So you can first predict the probability of class memberships and estimate the cluster-wise regression within a single model.
Pros and cons
What is nice about it, is that it is a model-based clustering technique, what means that you fit models to your data, and such models can be compared using different methods for model comparison (likelihood-ratio tests, BIC, AIC etc.), so the choice of final model is not that subjective as with cluster analysis in general. Braking the problem into two independent problems of clustering and then applying regression can lead to biased results and estimating everything within a single model enables you to use your data more efficiently.
The downside is that you need to make a number of assumptions about your model and have some thought about it, so it's not a black-box method that will simply take the data and return some result without bothering you about it. With noisy data and complicated models you can also have model identifability issues. Also since such models are not that popular, there are not widely implemented (you can check great R packages
flexmix
andpoLCA
, as far as I know it is also implemented in SAS and Mplus to some extent), what makes you software-dependent.Example
Below you can see example of such model from
flexmix
library (Leisch, 2004; Grun and Leisch, 2008) vignette fitting mixture of two regression models to made-up data.It is visualized on the following plots (points shapes are the true classes, colors are the classifications).
References and additional resources
For further details you can check the following books and papers: