Solved – “piecewise linear fitting” for logistic regression

classificationensemble learninglogisticmachine learning

For regression problem we can fit the data with a piecewise linear function (Linear Splines). Is there a "piecewise linear fitting" for binary classification?

Is that using spline basis expansion for logistic regression?

For example, some model to fit data like this (for logistic regression the decision boundary is a line. in the picture, the decision boundary is a piecewise linear function.)

enter image description here

Best Answer

Yes, what you're describing is a model where the predicted probability of the positive class is obtained by passing a piecewise linear function of the input through the logistic sigmoid function. That is:

$$p(y=1 \mid x) = \frac{1}{1 + \exp(-\phi(x))}$$

where $y \in \{0,1\}$ is the class label, $x \in \mathcal{X}$ is the input, and $\phi: \mathcal{X} \to \mathbb{R}$ is a piecewise linear function. Note that ordinary logistic regression is a special case, where $\phi(x) = w \cdot x$.

Neural nets with piecewise linear activation functions (e.g. ReLU, PReLU) and sigmoidal output units are a common form of this model. In this case, supposing $h(x)$ is a vector of activations in the last hidden layer, and $w$ and $b$ are the weights and bias of the output unit, then $\phi(x) = w \cdot h(x) + b$.

Gradient boosted decision trees are another common form. In this case, $\phi(x) = \sum_{i=1}^k w_i f_i(x)$ where each $f_i(x)$ is a decision tree with weight $w_i$. And, the trees and weights are learned sequentially by gradient boosting. Here, the piecewise linear components are usually parallel to the axes of the input space, because decision trees typically split along a single feature at a time. However, variants that split using oblique hyperplanes are also possible.

When using these models, we don't typically believe that the decision boundary is truly piecewise linear (as in your example). Rather, they're useful because piecewise linear functions can approximate arbitrary decision boundaries, while being fast to compute and efficient to learn.